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NEW SEQUENCES OF HEPATITIS C VIRUS GENOTYPES AND THEIR USE AS 
THERAPEUTIC AND DIAGNOSTIC AGENTS 



The invention relates to new sequences of hepatitis C vims (HCV") genotypes and their use 
as therapeutic and diagnostic agents. 

The present invention relates to new nucleotide and amino acid sequences correspondins 
to the coding region of a new type 2 subtype 2d, type-specific sequences corresponding to 
HCV type 3a, to new sequences corresponding to the coding region of a new subtype 3c, and 
to new sequences corresponding to the coding region of HCV type 4 and type 5 subtype 5a: 
a process for preparing them, and their use for diagnosis, prophylaxis and tberapv. 

The technical problem underlying the present invention is to provide new t\T3e-specific 
sequences of the Core, the El. the E2, the NS3, the NS4 and the NS5 regions of HCV t\'pe 
4 and type 5, as well as of new variants of HCV cypes 2 and 3. Tnese new HCV sequences 
are useful to diagnose the presence of type 2 and/or type 3 and/or c^r-pe 4 and/or type 5 HCV 
genotypes in a biological sample. Moreover, the availabiiit>- of diese new t:/'pe-specinc 
sequences can increase the overall sensitivity of HCV detection and should also prove to be 
useful for therapeutic purposes. 

Hepatitis C viruses (HCV) have been found to be the major cause of non-A. non-3 
hepatitis. The sequences of cDNA clones covering the complete genome of several protot}'pe 
isolates have been determined (Kato et al., 1990; Choo et al., 1991; Okamoto et ai., 1991; 
Okamoto et al., 1992). Comparison of these isolates shows that the variabUit>' in nucleotide 
sequences can be used to distinguish at least 2 different genoc\pes, type 1 (HCV-1 and HCV- 
J) and type 2 (HC-J6 and HC-J8), with an average homology of about 685fc. Within each 
type, at least two subtypes exist (e.g. represented by HCV-1 and HCV-J), having an average 
homology of about 79%. HCV genomes belonging to the same subtype show average 
homologies of more than 90% (Okamoto et al., 1992). However, the panial nucleotide 
sequence of the NS5 region of the HCV-T isolates showed at most 67% homology with the 
previously published sequences, indicating the existence of a yet another HCV t^-pe (Mori et 
al., 1992). Parts of the 5' uno-anslated region (UR), core, NS3, and NS5 regions of this t\?e 
3 have been published, further establjshing the similar evolutionary distances between the 3 
major genotypes and their subtypes (Chan et al., 1992). 

The identification of type 3 genotypes in clinical samples can be achieved by means of 
PCR with type-specirlc primers for the NS5 region. However, the degree to which this will 



SUBSTITUTE SHEET (RULE 26) 



wo 94/25601 ' OM:3B''ife '9-3 „ O S .1 s o J 

PCT/EP94/01323 

2 

be successful is largely dependent on sequence variabUity and on the virus titer present in the 
serum. Therefore, routine PGR in the open reading frame, especially for type 3 and the new 
type 4 and 5 described in the present invention and/or group V (Cha et ai.. 199^) .enotypes 
can be predicted to be unsuccessful. new r,-ping system (LiPA). based on variatL in the 
highly conserved 5' UR. proved to be more useful because the 5 major HCV genotypes anc 
their subtypes can be determined (Stuyver et al.. 1993). The selection of high-titer isolat-s 
enables to obtain PGR fragments for cloning with only 2 primers, while nested PGR requires 
that 4 primers match the unknown sequences of the new type 3. 4 and 5 genotypes. 

New sequences of the 5' untranslated region (5'UR) have been listed by Bukh et al. 
(1992). For some of these, the EI region has recently been described (Bukh'et ai.. 1993). 
Isolates with smiilar sequences in the S'UR to a group of isolates inciudins DK12 and HK'O 
described by Bukh et al. (1992) and E-bl to E-b8 described and classified as type 3 bv Ghan 
ec al. (1991). have been reported and described in the 5*UR, the carfaoxytermmai part' of El. 
and in the NS5 region as group FV by Gha et al. (1992; WO 92/19743). and have also been 
described in the 5'UR for isolate BRio and classified as type 3 by the inventors of this 
application (Stuyver et al.. 1993). 

The aim of the present invention is to provide new HGV nucleotide and ammo acid 
sequences enabling the detection of HGV infection. 

Another aim of the present infection is to provide new nucleotide and amino acid HGV 
sequences enabling the classification of infec^d biological fluids into different serolosical 
groups unambiguously linked to types and subtypes at the genome level. 

Anodier aim of the present invention is to provide new nucleotide and amino acid HGV 
sequences ameliorating the overall HGV detection rate. 

Anodier aim of the present invention is to provide new HCV sequences, useful for the 
design of HGV vaccine compositions. 

Another aim of the present invention is to provide a pharmaceutical composition consisting 
of antibodies raised against the polypeptides encoded by these neWHGV sequences, for 
therapy or diagnosis. 

The present invention relates more panicularly to a composition comprising or consisting 
of at least one polynucieic acid containing at least 5. and preferably 8 or more contiguous 
nucleotides selected from at least one of the following HGV sequences: 

an HGV type 3 genomic sequence, more particularly in any of the following 
regions: 



SUBSTITUTE SHEET (RULE 26) 




wo 54/25601 



PCT/EP94/0D23 



3 



the region spanning positions 417 to 957 of the Core/El region of HCV 
subtype 3a, 

the region spanning positions 4664 to 4730 of the NS3 region of HCV type 



3 



the region spanning positions 4892 to 5292 of the NS3/4 region of HCV 
type 3, 

the region spanning positions 8023 to 8235 of the NS5 region of the BR36 

subgroup of HCV subtype 3a, 

an HCV subtype 3c genomic sequence, 



more particularly the coding regions of the above-specified regions; 

- an HCV subtype 2d genomic sequence, more panicularly the coding region of HCV 
subtype 2d; 

- an HCV type 4 genomic sequence, more particulariy the coding region, more panicularly 
the coding region of subtN-pes 4a, 4e, 4f, 4g, 4h. 4i, and 4j. 

• an HCV type 5 genomic sequence, more panicularly the coding region of HCV lypQ 5, 
more panicularly the regions encoding Core, El, E2, NS3, and NS4 
with said nucleotide numbering being widi respect to the numbering of HCV nucleic acids 

as shown Ln Table 1, and with said polynucleic acids containing at least one nucleotide 

difference with known HCV (type 1, type 2, and t^^Tie 3) polynucleic acid sequences in the 

above-indicated regions, or the complement thereof. 

It is to be noted that. the nucleotide difference in the polynucleic acids of the invention may 

involve or not an amino acid difference in the corresponding amino acid sequences coded by 

said polynucleic acids. 



V According to a preferred embodiment, the present invention relates to a composition 
comprising or containing at least one polynucleic acid encoding an HCV polyprotein, with 
said polynucleic acid containing at least 5, preferably at least 8 nucleotides corresponding to 
at least part of an HCV nucleotide sequence encoding an HCV polyprotein, and with said 
HCV polyprotein containing in its sequence at least one of the following amino acid residues: 
L7, Q43, M44, S60, R67, Q70, T7I, A79, A87, N106, K115, A127, A190, SI30, V134, 
G142, 1144, E152, A157, VI58, P16_5, S177 or Y177, 1178, V180 or E180 or F182, R184, 
1186, H187. T189, A190, S191 or 0191, Q192 or L192 or 1192 or V192 or E192, N193 or 
H193 or P193, W194 or Y194, H195, A197 or 1197 or V197 or T197, V202, 1203 or L203, 
Q208, A210, V212, F214, T216, R217 or D217 or E217 or V217, H218 or N218, H219 or 
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V2I9 or U19. 1227 or 1227, M231 or E23. or Q23,, 1732 or D232 or A232 or K^3, 
Q235 or 1235, A237 or 7^37, 1242, 1246, S247. S248, V2«. S250 or YyO .251 or vVi 
or M251 or F25,, D252, T254 or V254, U55 or V255, E256 or A256. M258 or Pjs'or 
V258, A260 or Q260 or 3250, A261, T264 or Y264, m:53. 1266 or ..266, A267 0^63 or 
T268, F271 or M271 or V271. 1277, M280 or H2S0, 1284 or A284 or LS4 V'74 V^,, ■ 
N292 or S292. R293 or 1293 or Y293, Q294 or R294, L297 or 097 or Q297 ■a299 or K^99 
orQ299, N303 orr.03,T308 orL308,T310orF310orA3IOorD310orV310 L313 
G.17 or Q317, U33. S351, A353, A359, A363, S364, A366, T369. L373. F376 Q3S6 
1387, S392, 1399. F402. 1403, R405, D454, A461, A463, T464, K434. Q500 E501 S". ' 
I022, H524. N528, S53I, S532, V534, F536, F337, ^"39, 1546, C1282, Ai283 H13Io' 
V13I2, Q132I, P1368, V1372, V1373, K1405, Q1406. S1409. A1424 AI4-9 Cl-" 
S1436. SI456. HI496, AI504, D1510, D1529, 11543. .N1567, D!556 NI567 Ml-"-' 
Ql=79. L1581. S1583, Fi585. V1595, E1606 or T1606, M16n, V1612 or L16P P16--0 
C1636, P1651, T1656 or 11656, L1663, V1667, V1677, A1681, H1685, E1687 GI680 
V1695, A1700, Q1704, Yl-05, A17I3, AI714or 31714. M17I3, DI7I9, A17^I or TH^i' 
R1722, A1723 or VI723, H1726 or G1726, E1730, VI732, F1735. 11736. 31737 R1738 
T1739. G1740, Q1741, K1742. Q1743, AI744, T1745, LI746, E1747 or K.747 11749 
A1750, T1751 or A1751, V1753, N1755, K1756, A1757. P1758, A1759, H176^ T1763 
YI764, P2645, A26;7. K2650, K2653 or U653, S2664, N2673, F2680. K2681 L^686 
H2692, Q2695 or U695 or 12695. V2712, F271J, V2719 or Q2719. 12722 127^4 S-T^5 
K!726, G2729. Y2735. H2739. 12748, G2746 or 12746. 12748. P2752 or K2752. P2754 or 
T2754, T2757 or P2757, with said notation being composed of a letter reoresentms the amino 
actd residue by its one-letter code, and a number represenang the amino acid nomberina 
according to Kate et al.^ 1990. 

Each of the above-mentioned residues can be found in any of Figures 2, 5. 7. 11 or P 
showing the new amino acid sequences of the present invention aligned with known seauences 
of other types or subtypes of HCV for the Core, El . E2. NS3. NS4, and NS5 regions. 

More panicularly. a polynucleic acid contained in the composition according to the present 
invention contains at least 5. preferably 8. or more contiguous nucleotides corresponding to 
a sequence of contiguous nucleotides _selected from at least one of HCV sequences encoding 
the following new HCV amino acid sequences: 

- new sequences spanning amino acid positions 1 to 319 of the Core/El region of HCV 
subtype 2d. type 3 (more particularly new sequences for subt>pes 3a and 3c). new t>pe 4 
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subtypes (more panicularly new sequences for subtypes 4a, 4e. 4f, 4g, 4h, 4i and 4j) and 



type 5a, as shown in Figure 5; 

- new sequences spanning amino acid positions 328 to 546 of the E1/E2 region of HCV 
subtype 5a as shown in Fig^ire 12; 

- new sequences spanning amino acid positions 1556 to 1764 of the NS3/NS4 region of 
HCV type 3 (more panicularly for new subtypes 3a sequences), and subtype 5a, as shown 
in Figure 7 or 11; 

- new sequences spanning amino acid positions 2645 to 2757 of the NS5B region of HCV 
subtype 2d, type 3 (more panicularly for new subtypes 3a and 3c), new type 4 subtypes 
(more panicularly subtypes 4a, 4e, 4f, 4g, 4h, 4i and 4j) and subtype 5a,- as shown in 
Figure 2, 

Using the LiPA system mentioned above, Brazilian blood donors with high titer type 3 
hepatitis C virus, (}abonese patients with high-titer tv-pe 4 hepatitis C virus, and a Belgian 
patient with high-titer HCV z\^t 5 infection were selected. Nucleotide sequences in the core, 
El, NS5 and NS4 regions which have not yet been reported before, were analyzed in the 
frame of the invention. Coding sequences (with the exception of the core region) of any type 
4 isolate are reponed for the first time in the present invention. The NS5b region was also 
analyzed for the new type 3 isolates. After having determined die NS5b sequences, 
comparison with the Ta and To subtypes described by Mori e: al. (1992) was possible, and 
the type 3 sequences could be identified as t>-pe 3a genot^.-pes. The new t>'pe 4 isolates 
segregated into 10 subt\'pes, based on homologies obtained in the NS5 and El regions. New- 
type 2 and 3 sequences could also be distinguished from previously described tvpe 2 or 3 
subtypes from sera collected in Belgium and the Netherlands. 

The term "polynucleic acid" refers to a single sL-anded or double stranded nucleic acid 
sequence which may contain at least 5 contiguous nucleotides to the complete nucleotide 
sequence (f i. at least 6, 7, 8, 9, 10. 11, 12, 13, 14, 15 or more contiguous nucleotides). A 
polynucleic acid which is up till about 100 nucleotides in length is often also referred to as 
an oligonucleotide. A polynucleic acid may consist of deoxyribonucleotides or 
ribonucleotides, nucleotide analogues or modirled nucleotides, or may have been adapted for 
therapeutic purposes. A polynucleic acid may also comprise a double stranded cDNA clone 
which can be used for cloning purposes, or for in vivo therapy, or prophylaxis. 

The term "polynucleic acid composition" refers to any kind of composition comprising 
essentially said polynucleic acids. Said composition may be of a diagnostic or a therapeutic 
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The expression -nucleotides corresponding to' refers to nucleotides which are hotnoiogous 
or complementary to an indicated nucleotide sequence or region within a specific HCV 



seauenc 



The term "coding region" corresponds to the region of the HCV genome that encodes the 
HCV polyprotein. In fact, it comprises the complete genome wi± the exception of the 5" 
untranslated region and 3' untranslated region. 

Tne term "HCV polyprotein" refers to the HCV polyprotein of the HCV-J isolate (Kato 
et al.. 1990). The adenine residue at position 330 (Kato et al., 1990) is the first residue of 
the ATG codon that initiates the long HCV polyprotein of 3010 amino acids HCV-J and 
other type lb isolates, and of 3011 amino acids in HCV-1 and other type la isolates, and of 
3033 ammo acids in type 2 isolates HC-J6 and HC-J8 (Okamoto et al., 1992). 

This adenine is designated as position 1 at the nucleic acid level, and this methioame is 
designated as position 1 at the amino acid level, m the present invention. As tvpe la isolates 
contam 1 extra amino acid in ±e NS5a region, coding sequences of type la and lb have 
Identical numbering in the Core. El. NS3. and NS4 region, but wUl differ in the NS^-b region 
as mdicated m Table 1. Type 2 isolates have 4 extra amino acids in the E2 region, and 17 
or 18 extra amino acids in 

the NS5 region compared to type 1 isolates, and will differ in numbering from type 1 isolates 
m the NS3/4 region and NS5b regions as indicated in Table 1. 
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TABLE 1 





Region 


Positions 
described m 
the 

present 
invention* 


Positions 
described for 
HCV-J 
(Kato et al., 
1990) 


Positions 
described for 
HCV-1 
(Choc et al., 
1991) 


Positions 
described for 
HC-J6, HC-J8 
(Okamoto e: 
al.. 1992) 


Nucleotide 
s 


NS5b 


8023/8235 
7932/8271 


3352/8564 
3261/8600 


8026/8238 
7935/8274 


8433/8645 
8342/868! 


NS3/4 


4664/5292 
4664/4730 
4892/5292 
3856/4209 
4936/5292 


4993/5621 
4993/5059 
5221/5621 
4185/4523 
5265 562 1 


4664/5292 
4664/4730 
4892/5292 
3856/4209 


5017/5645 

5245/5645 
4209/476: 

J— ij^ -JO-rJ 






coding 
region 
of presenc 
mvention 


3 30.- 93 5 9 


1/9033 


342.-'9439 


Amino 
Acids 


NS5b 


2675/2745 
2645/2757 


2675/2745 
2645/2757 


2676/2746 
2646/2758 


2698/276S 
2668/2780 


NS3/4 


1556/1764 
1286/1403 
1646/1764 


1556/1764 
1286/ 1403 
1646/1764 


1556/1764 
1286/1403 
1646/1764 


1560/1768 
1290/1407 
1650/1768 



Table 1; Comparison of the HCV nucleotide and amino acid numbering system used in the 
present invention (*) with the numbering used for other prototype isolates. For 
example, 8352/8564 indicates the region designated by the numbering from 
nucleotide 8352 to nucleotide 8564 as described by Kato et al. (1990). Smce the 
numbering system of the present inventioa stans at the polyprotein iaitiation site, 
the 329 nucleotides of the 5' unuanslated region described by Kato et al. (1990) 
have to be substracted, and the corresponding region is numbered from nucleotide 
8023 ("8352-329") to 8235 ("8564-329"). 
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The tenn "HCV type" corresponds to a group of HCV isolates of which the complet 
genome shows more than 74% homology at the nucleic acid level, or of which the NS5 regio^ 
between nucleotide posiUons 7932 and 8271 shows more than 14% homology at dae nucleic 
acid level, or of which the complete HCV polyprotein shows more than 78% homology « the 
ammo acid level, or of which the NS5 region between amino acids at positions 2645 L ^75- 
shows more than Z0% homology at the amino acid level, to pol>^roteins of the other Elates 
of d,e group, wid. said numbering beginning at the first ATG codon or fust methionine of the 
long HCV polyprotein of the HCV-J isolate (Kato et al.. 1990). Isolates beionemg to difFerent 
types of HCV exhibit homologies, over die complete genome, of less dian 74% at die nucleic 
acid level and less than 78% at the amino acid level. Isolates belongmg to same type 
usually show homologies of about 92 to 95% at die nucieic acid level and 95 to 96% at the 
ammo acid level when belonging to the same subtype, and diose belon.ine to die same tvoe 
but different subtypes preferably show homologies of about 19% at die nucleic acid.levei and 
85-86% at the amino acid level. 

More preferably die defmidon of HCV types is concluded from the classification of HCV 
isolates accordmg to their nucleodde distances calculated as detailed below: 

(1) based on phylogenetic analysis of nucleic acid sequences in die NS5b reeion between 
nucleotides 7935 and 8274 (Choo et al., 1991) or 8261 and 8600 (Kato et al., 1990) or 834"^ 
and 8681 (Okamoto et al.. 1991), isolates belonging to die same HCV ty^e show nucleodde 
distances of less dian 0.34, usually less dian 0.33, and more usually of less dian 0 32 and 
isolates belongmg to die same subt>-pe show nucleotide distances of less dian 0.135 usually 
of less dian 0. 1 3. and more usually of less dian 0. 125. and consequendy isolates belonsina to 
the same type but different subtypes show nucleotide distances ranein. from 0 135 to 0 34 
usually ranging from 0.1384 to 0.2^77. and more usually ranging from 0.15 to 0 3^ and 
isolates belonging to different HCV t>pes show nucleodde distances greater dian 0.34. usuallv 
greater diat 0.35, and more usually of greater dian 0.358, more usuallv raneine from 0 138^ ' 
to 0.2977. 

(2) based on phylogenetic analysis of nucleic acid sequences in die core/El reaion between 
nucleoddes 378 and 957. isolates belonging to die same HCV type show nucleotide distances 
of less dian 0.38, usually of less dian 0.37. and more usually of less dian 0,364. and isolates 
belongmg to the samesubtype show nucleotide distances of less than 0.17. usuallv of less dian 
0.16. and more usually of less dian 0.15. more usually less dian 0.135. more usually less dian 
0.134, and consequently isolates belonging to die same t>-pe but different subtvpes show 
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nucleotide distances ranging from 0.15 to 0.38. usually ranging from 0.16 to 0.37, and more 
usually ranging from 0.17 to 0.36, more usually ranging from 0.133 to 0.379, and isolates 
belonging to different HCV types show nucleotide distances greater tiian 0.34. 0.35, 0.36. 
usually more dian 0.365, and more usually of sreater tha^ 0 37 

(3) based on phylcgenetic analysis of nucleic acid sequences in the NS3/NS4 region 
between nucleotides 4664 and 5292 (Choo et al., 1991) or between nucleotides 4993 and 5621 
(Kato et al.. 1990) or between nucleotides 5017 and 5645 (Okamoto et al.. 1991). isolates 
belonging to the same HCV type show nucleotide distances of less than 0.35. usually of less 
than 0.34, and more usually of less than 0.33, and isolates belonging to the same subtype show 
nucleotide distances of less than 0. 19. usually of less than 0. 18, and more usually of less than 
0.17, and consequently isolates belonging to the same type but different subtypes shpw 
nucleotide distances ranging from 0.17 to 0.35. usually ranging from 0.18 to 0.34. and more 
usually ranging from 0.19 to 0.33. and isolates belonging to different HCV types show 
nucleotide distances greater dian 0.33. usually greater than 0.34. and more usually of greater 
than 0.35. 



Table 2 : Molecular evolutionary distance 



Regioa 


Core/'Hl 


El 


NS5B 


NS5B 




579 bp 


384 bp 


340 bp 


222 bp 


Isolates* 


0.0017 - 0.1347 


0.0026 - 0.2031 


0.0003 -0.1151 


0.000 - 0.1323 




(0.0750 ± 0.0245) 


(0,0969 ± 0.0239) 


(0.0637 ± 0.0229) 


(0.0607 ± 0.0205) 


Subcypes* 


0.1330 - 0.3794 


0.1645 - 0.4869 


0.1384 -0,2977 


0.117 - 0.3533 




(0.2786 ± 0.0363) 


(0.3761 + 0.0433) 


(0.2219 4:^0.0341) 


(0.2391 ± 0.0399) 


Types* 


0.3479 - 0.6306 


0.4309 - 0,9561 


0.3581 -0.6670 


0.3457 - 0.747i 




(0-4703 ± 0.0525) 


(0,6308 +: 0.092S) 


(0.4994 ± 0.0495) 


(0.5295 ± 0.0627) 



Figures created by the PHYLIP program DN,\DIST are expressed as minimum to 
maximum (average + standard deviation). Phylogenetic distances for isolates belonging 
to the same subtype ('isolates'), to different subtypes of the same type (•subt>'pes-), and 
to different types ('types') are given. 

In a comparative phylogenetic analysis of avaUable sequences, ranges of molecular 
evolutionary distances for different regions of the genome were calculated, based on 19,781 
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pairwise comparisons by means of the DNA DIST program of the phylogeny inference 
package PHYLIP version 3.5C (Felsenstein. 1993). The results are shown in Table 2 and 
indicate that although the majority of distances obtained in each region fit with classification 
of a certain isolate, only the r-^ges obtained in th, 340bp NS5B-regioa are aon-overiapoin. 
and therefor conclusive. However, as was performed in the present invention, it is preferabll- 
to obtain sequence information from at least 2 regions before fmal classification of a given 
isolate. 

Designation of a number to the differem types of HCV and HCV types nomenclature is 
based on chronological discovery of the different types. The numbering system used in the 
present invention might still flucmate according to international conventions or gurdelines. For 
example, "type 4" might be changed into "type 5" or "type 6-. 

The term "subtype" corresponds to a group of HCV isolates of which the complete 
polyprotein shows a homology of more than 90% both at the nucleic acid and ammo acid 
levels, or of which the NS5 region between nucleotide positions 7932 and 8271 shows a 
homology of more than 90 fo at the nucleic acid level to the corresponding pans of die 
genomes of the other isolates of the same group, with said numbermg begimiing with the 
adenine residue of the initiation codon of the HCV polyprotein. Isolates belonging to the same 
type but different subtypes of HCV show homologies of more than 74% at the nucleic acid 
level and of more dian 78% at the amino acid level. 

The term "BR36 subgroup" refers to a group of type 3a HCV isolates (BR36. BR33, 
BR34) that are 95 % . preferably 95.5 % . most preferably 96 % homologous to the seaueaces 
as represented in SEQ ID NO 1. 3, 5. 7. 9. 11 in the NS5b region from position 8023 to 
8235. 

It is to be understood that extremely variable regions like the El. E2 and NS4 regions will 
exhibit lower homologies than the average homology of the complete genom'e of the 
polyprotein. 

Using these criteria. HCV isolates can be classified into at least 6 types. Several subr>'pes 
can clearly be distinguished in types 1. 2. 3 and 4 : la, lb. 2a. 2b. 2c. 2d. 3a, 3b. 4a. 4b. 
4c. 4d. 4e. 4f, 4g, 4h, 4i and 4j based on homologies of the 5' UR and coding regions 
including the part of NS5 between positions 7932 and 8271. An overview of most of the 
reported isolates and their proposed classification according to the typing system of the 
present invention as well as other proposed class irlcations is presented in Table 3. 
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Table 3 

HCV CLASSIHCATION 



0K.\- MORI SAKA CHA PROTOTYPE 
MOTO 0 

I 1 Pt GI HCV-1. HCV-H. HC-Jl 

lb II n KI on HCV-J. HCV-BK. HCV-T. HC-JKl. HC- 

J4. HCV-CHINA 

Ic HC.G9 

2a mm K2a cm HC-J6 

2b rv IV K2b Gin HC-J8 

2c S83. AJIG6, ARG3. 110, T983 

2d NE92 

3a V V K3 GIV E-bl. Ta. BR36. BR35. HDIO. NZLl 

3b VI K3 GIV HCV-TR. To 

3c BE98 

4a Z4. GB809-t 

4b zi 

4c GBI16. 03353. G3215, 16, Z7 

4d DK13 

4e GB809-2. CA.M600. CAM736 

4f CAM622. CA.M627 

4g GB549 

4h GB438 

4i CAR4/1205 

4j CARl/501 

4k EG29 

5a GV SA3, SA4, SAI. SA7, SAll, BE95 

6a HKl. HK2. HK3. HK4 
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The term "complement" refers to a nucleotide sequence which is complementary to an 
indicated sequence and which is able to hybridize to the indicated sequences. 

The composition of the invention can comprise many combinations. By way of example, 
die composition of the invention can comprise: 

- two (or more) nucleic acids from the same region or, 

- two nucleic acids (or more), respectively from different regions, for the same isolate or 
for different isolates, 

- or nucleic acids from die same regions and from at least two different regions (for the 
same isolate or for different isolates). 

The present invention relates more panicularly to a polynucleic acid composition as defmed 
above, wherein said polynucleic acid corresponds to a nucleotide sequence selected from any 
of the following HCV type 3 genomic sequences: 

- an HCV genomic sequence having a homology of at least 67 % , preferably more than 69 % , 
more preferably 11%, even more preferably more than 73 % . or.most preferably more than 
76% to any of the sequences as represented in SEQ ID NO 13, 15, 17, 19, 21, 23, 25 or 
27 (HDIO, BR36 or BR33 sequences) in the region spanning positions 417 to 957 of the 
Core/El region as shown in Figure 4; 

- an HCV genomic sequence having a homology of at least 65 % . preferably more dian 67 % , 
preferably more than 69%. even preferably more than 70%, most preferably more than 
74% to any of the sequences as represented in SEQ ID NO 13. 15. 17, 19. 21. 23, 25 or 
27 (HDIO, BR36 or BR33 sequences) in the region spanning positions 574 to 957 of die 
El region as shown in Figure 4; 

- an HCV genomic sequence as having a homology of at least 79%. more preferably at least 
81 % , most preferably more than 83% or more to any of the sequences as represented in 
SEQ ID NO 147 (represeming positions 1 to 346 of die Core region of HVC type 3c. 
sequence BE98) in the region spanning positions 1 to 378 of die Core region as shown in 
Figure 3; 

- an HCV genomic sequence of HVC type 3a having a homology of at least 74%. more 
preferably at least 76%. most preferably more than 78% or more to any of the sequences 
as represented in SEQ ID NO 13, 15. 17. 19. 21. 23. 25 or 27 (HDIO. BR36 or BR33 
sequences) in die region spanning positions 417 to 957 in die Core/El region as shown in 
Figure 4; 

- an HCV genomic sequence of HCV type 3a as having a homology of at least 74%. 
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preferably more than 76 7o, most preferably 78% or more to any of the sequences as 
represented in SEQ ID NO 13, 15, 17, 19, 21, 23, 25 or 27 (HDIO, BR36 or BR33 
sequences) in ±e region spanning positions 574 to 957 in the El region as shown in Figure 
4; 

- an HCV genomic sequence as having a homology or more than 73.5%, preferably more 
than 74%, most preferably 75% homology to the sequence as represented in SEQ ID NO 
29 (HCC153 sequence) in the region spanning positions 4664 to 4730 of the NS3 region 
as shown in figure 6; 

- an HCV genomic sequence having a homology of more than 70%. preferably more than 
72%, most preferably more than 74% homology to any of the sequences as 'represented 
in SEQ ID NO 29, 31, 33, 35, 37 or 39 (HCC153, HDIO, BR36 sequences) m the regioD 
spanning positions 4892 to 5292 in die NS3/NS4 region as shown in Figure 6 or 10; 

- an HCV genomic sequence of die BR36 subgroup of HCV type 3a as having a homologv 
of more dian 95%, preferably 95,5%, most preferably 96% homology to any of die 
sequences as represented in SEQ ID NO 5, 7, 1, 3. 9 or 11 (BR34. BR33, BRj6 
sequences) in die region spanning positions 8023 co 8235 of die NS5 region as shown in 
Figure 1; 

- an HCV genomic sequence of the BR36 subgroup of HCV type 3a as having a homology 
of more dian 96%, preferably 96.5%, most preferably 97% homology to any of die 
sequences as represented in SEQ ID NO 5, 7, 1, 3, 9 or 11 (BR34, BR33. BR36 
sequences) in die region spanning positions 8023 to 8192 of die NS5B region as shown in 
Figure 1; 

- an HCV genomic sequence of HCV type 3c being characterized as having a homology of 
more dian 79%, more preferably more dian 81%, and most preferably more dian 83% to 
die sequence as represented in SEQ ID NO 149 (BE98 sequence) in die region spanning 
positions 7932 to 8271 in die NS5B region as shown in Figure 1. 

Preferentially die above-mentioned genomic HCV sequences depict sequences from die 
coding regions of all the above-mentioned sequences. 

According to the nucleotide distance classification system (widi said nucleotide distances 
being calculated as explained above), said sequences of said composition are selected from: 

- an HCV genomic sequence bebg characterized as having a nucleotide distance of less than 
0,44. preferably of less dian 0.40. most preferably of less dian 0.36 to any of die 
sequences as represented in SEQ ID NO 13. 15. 17, 19. 21. 23. 25 or 27 in die region 
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spanning positions 417 to 957 of the Core/El region as shown in Figure 4; 

- an HCV genomic sequence being characterized having a nucleotide distance of less than 
0.53. preferably less than 0.49. most preferably of less dian 0.45 to anv of che sequences 
as represented m SEQ ID NO 19. 21. 23. 25 or 27 in d.e region spanning posuioos 574 
to 957 of the Hi region as shown in Figure 4; 

- an HCV genomic sequence characterized having a nucleotide distance of less than 0 15 
preferably less than 0. 13. and most preferably less than 0. 1 1 to any of the sequences a^ 
represented in SEQ ID NO 147 m the region spanning positions 1 to 378 of the Core 
region as shown in Figure 3; 

- an HCV genomic sequence of HVC type 3a being characterized as having a nucleotide 
distance of less ±an 0.3. preferably less than 0.26. most preferably of less than 0.22 to 
any of the sequences as represented in SEQ ID NO 13. 15. 17. 19. 21. 23. 25 or 27 in the 
region spanning positions 417 to 957 in the Core/El region as shown in Fie^re 4; 

- an HCV genomic sequence of HCV type 3a bemg characterized as having" a nucleotide 
distance of less than 0.35. preferably less than 0.31. most preferably of less than 0.27 to 
any of the sequences as represented in SEQ ID NO 13. 15. 17. 19. 21. 23. 25 or 27 m the 
region spanning positions 574 to 957 in the EI region as shown in Figure 4: 

- an HCV genomic sequence of the BR36 subgroup of HCV type 3a beilg characterized as 
having a nucleotide sequence of less than 0.0423, preferably less than 0.042. preferably 
less than 0.0362 to any of the sequences as represented in SEQ ID NO 5. 7. 1 , 3. 9 or 1 1 
in the region spanning positions 8023 to 8235 of the NS5 region as shown' in Fisure 1; 

- an HCV genomic sequence of HCV type 3c being characterized as havins a nucleotide 
distance of less than 0.255. preferably of less than 0.25. more preferably of less than 0.2 1 . 
most preferably of less than 0. 17 to the sequence as represented in SEQ ID NO 149 In the 
region spanning positions 7932 to 8271 in the NS5B region as shown in Figure 1. 

In the present application, the El sequences encoding the antigenic ectodomain of the El 
protein, which does not overlap the carboxyterminal signal-anchor sequences of El disclosed 
by Cha et al. (1992; WO 92/19743). in addition to the NS4 epitope region, and a part of the 
NS5 region are disclosed for 4 different isolates: BR33. BR34. BR36. HCC153 and HDIO. 
all belonging to type 3a (SEQ ID NO 1. 3. 5. 7. 9, 11. 13. 15. 17. 19. 21. 23. 25. 27. 29. 
31. 35. 37 or 39). 

Also within the present invention are new subtype 3c sequences (SEQ ID NO 147. 149 of 
the isolate BE98 in the Core and NS5 regions (see Figures 3 and 1). 
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Finally the present invention also relates to a new subtype 3a sequence as represented in 
SEQD NO 217 (see Figure 1) • 

AJso included within the present invention are sequence variants of the polynucleic acids 
as seiecttd from any of the nucleotide sequences as given in any of the above mentioned SEQ 
ID numbers, with said sequence variants containing either deletions and/or insertions of one 
or more nucleotides, mainly at the extremities of oligonucleotides (either 3* or 5'). or 
substinitions of some non-essential nucleotides by others (including modified nucleotides an/or 
mosine), for example, a type 1 or 2 sequence might be modified into a type 3 sequence by 
replacing some nucleotides of the type I or 2 sequence with type-specific nucleotides of type 
3 as shown in Figure 1 (NS5 region), Figure 3 (Core region). Figure 4 (Core/£l region). 
Figure 6 and 10 (NS3/NS4 region). 

According to another embodiment, the present invention relates to a polynucleic acid 
composition as derlned above, wherein said polynucleic acids correspond to a nucleotide 
sequence selected from any of the following HCV type 5 genomic sequences: 

- an HCV genomic sequence as having a hotnology of more than 85 % , preferably more than 
86%, most preferably more than 87% homology to any of the sequences as represented 
in SEQ ID NO 41, 43, 45. 47, 49. 51. 53 (PC sequences) or 151 (BE95 sequence) in ±e 
region spanning positions 1 to 573 of the Core region as shown in Figure 9 and 3; 

- an HCV genomic sequence as having a homology of more than 61 % . preferably more than 
63%, more preferably more than 65% homology, even more preferably more than 66% 
homology and most preferably more than 67% homology (f.i. 69 and 71 %) to any of the 
sequences as represented in SEQ ID NO 41. 43, 45, 47, 49, 51, 53 (PC sequences). 153 
or 155 (BE95, BE 100 sequences) in the region spanning positions 574 to 957 of the El 
region as shown in Figure 4; 

- an HCV genomic sequence having a homology of more than 76.5%, preferably of more 
than 77%, most preferably of more than 78% homology with any of the sequences as 
represented in SEQ ID NO 55, 57, 197 or 199 (PC sequences) in the region spanning 
positions 3856 to 4209 of the NS3 region as shown in Figure 6 or 10: 

- an HCV genomic sequence having a homology of more than 68 % , preferably of more than 
70%, most preferably of more than 72% homology with the sequence as represented in 
SEQ ID NO 157 (BE95 sequence) In the region spanning positions 980 to 1179 of the 
E1/E2 region as shown in Figure 13; 

- an HCV genomic sequence having a homology of more than 57%. preferably more than 
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59%. most preferably more than 61% homology to any of the secpiences as represented 
m SEQ ID NO 59 or 61 (PC sequences) in the region spanning positions 4936 co 5296 of 
the NS4 region as shown in Figure 6 or 10; 
- an HCV genomic sequence as having a homology of more than 93 % . preferably more 
93o%. most preferably more than 94% homology to any of the sequences as represented 
m SEQ ID NO 159 or 161 (BE95 or BE96 sequences) m the region spanmng positions 
7932 to 8271 of the NS5B region as shown in Figure 1. 

Preferentially the above-mentioned genomic HCV sequences depict sequences from the 
codmg regions of all the above-mentioned sequences. 

Accordmg to the nucleotide distance classification system (with said nucleotide distances 
being calculated as explained above), said sequences of said composition are selected from- 

- a nucleotide distanceof less than 0.53. preferably less than 0.51. more preferably less than 
0.49 for the El region to the type 5 sequences depicted above; 

- a nucleotide distance of less than 0 3 ar-'e-ahlv 1p« rh^n n i« - . , . 

J iu«m U.J. pr-..s..aDiy less than 0.28. more preierably of less 

than 0.26 for the Core region to the t>pe 5 sequences depicted above; 

- a nucleotide distance of less than 0.072. preferably less than 0.071. more preferably less 
than 0.070 for the NS5B region to the type 5 sequences as depicted above. 

Isolates with similar sequences in die S'UR to a group of isolates including SAl SA3 and 
SA7 described in the 5'UR by Bukh et al. (1992). have been reported and described m the 
5'UR and NS5 region as group V by Cha et al. (1992; WQ 92/19743). This group of isolates 
belongs to type 5a as described in the present invention (SEQ ID NO 41. 43. 45, 47, 49, 51, 
53. 55, 57, 59, 61. 151. 153, 155, 157, 159. 161. 197 and 199). 

Also included within the present invention are sequence variants of the polvnucleic acids 
as selected from any of the nucleotide sequences as given in any of the above given SEQ ID 
numbers with said sequence variants containing either deletion and/or insertions of one or 
more nucleotides, mainly at the extremities of oligonucleotides (either 3" or 5'). or 
substitutions of some non-essential nucleotides (i.e. nucleotides not essential to discriminate 
bet^'een different genotypes of HCV) by others (including modified nucleotides an/or 
mosine). for example, a type 1 or 2 sequence might be modirled into a type 5 sequence by 
replacing some nucleotides of the type 1 or 2 sequence with t>pe-specific nucleotides of ope 
5 as shown in Figure 3 (Core region). Figure 4 (Core/El region). Figure 10 (NS3 / NS4 
region). Figure 14 (EI/E2 region). 
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Another group of isolates including BU74 and BU79 having similar sequences in the 5'UR 
to isolates including Z6 and Z7 as described in the 5'UR by Bukh et ai. (1992), have been 
described in ±e 5'UR and classified as a new type 4 by die inventors of this application 
(Stuyver et al., 1993). Coding sequences, including core, El and NS5 sequences of several 
new Gabonese isolates belonging to this group, are disclosed in the present invention (SEQ 
ID NO 106, 108, 110, 112, 114, 116, 118, 120 and 122). 

According to yet another embodiment, the present invention relates to a composition as 
defined above, wherein said polynucleic acids correspond to a nucleotide sequence selected 
from any of the following HCV type 4 genomic sequences: 

- an HCV genomic sequence having a homology of more dian 66%, preferably more than 
68%, most preferably more than 70% homology in the El region spanning positions 574 
to 957 to any of the sequences as represented in SEQ ID NO 118, 120 or 122 (GB358. 
GB549. GB809 sequences) as shown in Fiaure 4: 

- an HCV genomic sequence having a homology of more than 71%, preferably more than 
72%, most preferably more than 74% homology to any of the sequences as represented 
in SEQ ID NO 1 18. 120 or 122 (GB358, GB549, GB809 sequences) in the region spanning 
positions 379 to 957 of the El region as shown in Figure 4; 

- an HCV genomic sequence having a homology of more than 92%, preferably more dian 
93%, most preferably more than 94% homology to any of the sequences as represented 
in SEQ ID NO 163 or 165 (GB809, CAM600 sequences) in the region spanning positions 
1 to 378 of the Core/El region as shown in Figure 4; 

- an HCV genomic sequence (subtype 4c) having a homology of more than 85 % , preferably 
more dian 86%. more preferably more than 86.5% homology, most preferably more than 
87, more than 88 or more than 89% homology to any of the sequences as represented in 
SEQ ID NO 183. 185 or 187 (GB116. GB215, GB809 sequences) in the region spanning 
positions 379 to 957 of the El region as shown in Figure 4; 

- an HCV genomic sequence (subtype 4a) having a homology of more than 81 % , preferably 
more than 83 % . most preferably more than 85 % homology to the sequence as represented 
in SEQ ID NO 189 (GB908 sequence) in the region spanning positions 379 to 957 of the 
El region as shown in Figure 4; 

- an HCV genomic sequence (subtype 4e) having a homology of more than 85 % . preferably 
more than 87%, most preferably more than 89% homology to any of the sequences as 
represented in SEQ ID NO 167 or 169 (CAiM600. GB908 sequences) Ln the region 
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spanniBg positions 379 to 957 of the El region as shown in Figure 4; 

- an HCV genomic sequence (subtype 4f) having a homology of more than 79% , preferably 
more than 81%. most preferably more than 83% homology to any of the sequences as 
represented in SEQ ID NO 171 or 173 (C^MG-. CA.MG27 sequences) in the region 
spanmng positions 379 to 957 of the El region as shown in Figure 4; 

- an HCV genomic sequence (subtype 4g) having a homology of more than 84% . preferably 
^ more than 86%. most preferably more than 88% homology to the seauence as represented 

m SEQ ID NO 175 (GB549 sequence) m the region spamung positions 379 to 957 of the 
El region as shown in Figure 4; 

- an HCV genomic sequence (subtype 4h) having a homology of more than 83% , preferably 
more than 85%, most preferably more than 87% homology to the sequence as reoresemed 
m SEQ ID NO 177 (GB438 sequence) m the region spanmng positions 379 to 957 of the 
El region as shown in Figure 4; 

- an HCV genomic sequence (subtype 4i) as having a homology of more than 76%, 
preferably more than 78%, most preferably more than 80% homology to the sequence as 
represented in SEQ ID NO 179 (CAR4/1205 sequence) m the region spannmg positions 
379 to 957 of dje El region as shown in Figure 4; 

- an HCV genomic sequence (subt>pe 4j?) having a homology of more than 84% , preferablv 
more than 86%. most preferably more than 88% homology to the sequence as reoresented 
in SEQ ID NO 181 (CAR4/901 seauence) m the region spanning positions 379 'to 957 of 
the El region as shown in figure 4; 

- an HCV genomic sequence as having a homology of more than 73 % . preferablv more than 
75%. most preferably more than 77% homology to any of the sequences as ret:resented 
in SEQ ID NO 106. 108. 110. 112. 114. or 116 (GB48. GB116, GB215. GB358'. GB549. 
GB809 sequences) in the region spamiing positions 7932 to 8271 of the NS5 region as 
shown in figure 1; 

- an HCV genomic sequence (subtype 4c) having a homology of more ihan 88 %. preferably 
more than 89%. most preferably more than 90% homology to any of the sequences as 
represented in SEQ ID NO 106. 108. 110, or 112 (GB48. GB116, GB215. GB358 
sequences) in the region spanning positions 7932 to 8271 of the NS5 region as shown in 
Figure 1; 

- an HCV genomic sequence (subtype 4e) having a homology of more than 88%. preferably 
more than 89%. most preferably more than 90% homology to any of the sequences as 
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represented in SEQ ID NO 116 or 201 (GB809 or CAM 600 sequences) in the region 
spanning positions 7932 to 8271 of the NS5 region as shown in Figure 1; 

- an HCV genomic sequence (subt:/pe 40 having a homology of more than 87?5 . preferably 
more than 89%, most preferably more than 90?S homology to the sequence as represented 
in SEQ ID NO 203 (C.\iMG22 sequence) in the region spanning positions 7932 to 8271 
of the NS5 region as shown in Figure 1; 

- an HCV genomic sequence (subtype 4g) as having a homology of more than 85%, 
preferably more than 87%, most preferably more than 89% homology to ±e sequence as 
represented in SEQ ID NO 114 (GB549 sequence) in the region spamiing positions 7932 
to 8271 of the NS5 region as shown in Figure 1; 

- an HCV genomic sequence (subtype 4h) as having a homology of more than 86%, 
preferably more than 87%, more preferably more than 88% homology, more preferably 
more than 89% homology to the sequence as represented in SEQ ID NO 207 (GB437 
sequence) in the region spanning positions 7932 to 8271 of the NS5 region as' shown in 
Figure 1; 

- an HCV genomic sequence (subt>pe 4i) having a homology of more than 84 % , preferably 
more than 86%. most preferably more than 88% homology to the sequence as represented 
in SEQ ID NO 209 (CAR4/1205 sequence) in the region spanning positions 7932 to 8271 
of the NS5 region as shown in figure 1; 

- an HCV genomic sequence (subt>pe 4j) having a homology of more than 81 % . preferably 
more than 83%. most preferably mors than 85% homology to the sequence as represented 
m SEQ ID NO 211 (CARl/501 sequence) in the region spanning positions 7932 to 8271 
of the NS5 region as shown in figure 1 . 

Preferentially the above-mentioned genomic HCV sequences depict sequences from the 
coding regions of all the above-mentioned sequences. 

According to the nucleotide distance classification system (with said nucleotide distances 
being calculated as explained above), said sequences of said composition are selected froin: 

- an HCV genomic sequence (type 4) being characterized as having a nucleotide distance of 
less than 0.52, 0.50, 0.4880. 0.46, 0.44, 0.43 or most preferably less than 0.42 in the 
region spanning positions 574 to 957 to any of the sequences as represented in SEQ ID NO 
118, 120 or 122 in the region spanning positions 1 to 957 of the Core/El region as shown 
in Figure 4; 

- an HCV genomic sequence (t>'pe 4) being characterized as having a nucleotide distance of 
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less than 0.39. 0.36 0.34 0.32 or most preferably less than 0.31 to any of the sequences 
as represented in SEQ ID NO 1 18. 120 or 122 in the region spanning positions 379 to 957 
of the El region as shown in Figure 4; 

- an HCV genomic sequence (subr>?e 4c) being characterized as having a nucleotide distance 
of less ±an 0.27. 0.26. 0.2., 0.22. 0.20. 0.18. 0.17. 0.162. 0.16 or most preferably less 
than 0.15 to any of the sequences as represented in SEQ ID NO 183. 185 or 187 in th^ 
region spanning positions 379 to 957 of the El region as shown in Figure 4; 

- an HCV genomic sequence (subtype 4a) being characterized as havine a nucleotide distance 
of less than 0.30. 0.28. 0.26. 0.24. 0.22, 0.21 or most preferably of less than 0.205 to the 
sequence as represented in SEQ ID NO 189 in the region spamiing positions 379 to 957 
of the El region as shown in Figure 4; 

- an HCV genomic sequence (subtN'pe 4e) being characterized as bavins a nucleotide distance 
of less than 0.26. 0.25 , 0.23 . 0.21. 0.19. 0.17. 0.165. most preferably less than 0.16 to 
any of the sequences as represented in SEQ ID NO 167 or 169 in ±e region soannina 
positions 379 to 957 of the El region as shown in Figure 4; 

- an HCV genomic sequence (subtype 4f) being characterized as bavins a nucleotide distance 
of less than 0.26. 0.24. 0.22, 0.20. 0.18, 0.16, 0.15 or most preferably less than 0.14 to 
any of the sequences as represented in SEQ ID NO 171 or 173 in the region spanning 
positions 379 to 957 of the El region as shown in Fig'ore 4; 

- an HCV genomic sequence (subtype 4g) being characterized as having a nucleotide 
distance of less than 0.20. 0.19. 0.18. 0.17 or most preferably of less than 0.16 to the 
sequence as represented m SEQ ID NO 175 in the region spanning positions 379 to 957 
of the El region as shown m Figure 4; 

- an HCV genomic sequence (subtype 4h) being characterized as having a nucleotide 
distance of less than 0.20, 0.19. 0.18, 0.17 and most preferably of less than 0.16 to ±e 
sequence as represented in SEQ ID NO 177 in the region spanning positions 379 to 957 
of the El region as shown in Figure 4; 

- an HCV genomic sequence (subt>pe 4i) being characterized as having a nucleotide distance 
of less than 0.27. 0.25. 0.23. 0.21 and preferably less than 0.16 to the sequence as 
represented in SEQ ID NO 179 in the region spanning positions 379 to 957 of the El 
region as shown in Figure 4; 

- an HCV genomic sequence (subtype 4j?) being characterized as havmg a nucleotide 
distance of less than 0.19. 0.18. 0.17. 0.165 and most preferably of less dian 0.16 to the 
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sequence as represented in SEQ ID NO 181 in the region spanning positions 379 to 957 
of the El region as shown in figure 4; 

- an HCV genomic sequence (type 4) being characterized as having a nucleotide distance of 
less than C.35, 0.34, 0.32 and most preferably of less than 0.30 to any of the sequences 
as represented in SEQ ID NO 106, 108, 110. 112, 114, or 116 m the region spanning 
positions 7932 to 8271 of the NS5 region as shown in figure I; 

- an HCV genomic sequence (subtype 4c) being characterized as having a nucleotide distance 
of less than 0,18, 0.16, 0.14, 0.135, 0.13, 0.1275 or most preferably less than 0.125 to 
any of the sequences as represented in SEQ ID NO 106, 108, 110, or 112 in the region 
spanning positions 7932 to 8271 of the NS5 region as shown in Figure 1; " * 

- an HCV genomic sequence (subp^-pe 4e) being characterized as having a nucleotide distance 
of less than 0.15, 0.14, 0.135, 0.13 and most preferably of less than 0.125 to any of the 
sequences as represented in SEQ ID NO 116 or 201 in the region spanning positions 7932 
to 8271 of the NS5 region as sho^'n in Figure 1; 

• an HCV genomic sequence (subt}-pe 4f) being characterized as having a nucleotide distance 
of less than 0.15, 0.14, 0.135, 0.13 or most preferably less ihan.0.12o to the sequence as 
represented in SEQ ID NO 203 in die region spanning positions 7932 to 8271 of the NS5 
region as shown in Figure 1; 

- an HCV genomic sequence (subtype 4g) being characterized as having a nucleotide 
distance of less than 0. 17, 0. 16, 0. 15, 0. 14, 0. 13 or most preferably less than 0. 125 to the 
sequence as represented in SEQ ID NO 114 in the region spanning positions 7932 to 8271 
of the NS5 region as shown in Figure 1; 

- an HCV genomic sequence (subtype 4h) being characterized as having a nucleotide 
distance of less than 0.155, 0.15, 0.145, 0.14, 0.135, 0.13 or most preferably less than 
0.125 to the sequence as represer.ed in SEQ ID NO 207 in the region spanning positions 
7932 to 8271 of the NS5 region as shown in Figure 1; 

- an HCV genomic sequence (subtype 4i) being characterized as having a nucleotide distance 
of less than 0.17. 0.16. 0.15, 0.14, 0.13 or most preferably of less than 0.125 to the 
sequence as represented in SEQ ID NO 209 in die region spanning positions 7932 to 8271 
of the NS5 region as shown in fig^jre 1; 

- an HCV genomic sequence (subt\pe 4j) being characterized as having a nucleotide distance 
of less than 0.21, 0.20. 0.19, O.IS. 0.17, 0.16, 0.15, 0.14, 0.13 and most preferably of 
less than 0.125 to the sequejiCfras represented in SEQ ID NO 211 in the region spanning 
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positions 7932 to 8271 of the NS5 region as shown in figure 1. 

Also included within the present invention are sequence variams of the polynucieic acids 
as selected from any of the nucleotide sequences as given in any of the above given SEQ ID 
numbers with said sequence variants containing either deletion and/or insertions of one or 
more nucleotides, mainly at the extremities of oligonucleotides (either 3' or 5') or 
substitutions of some non-essential nucleotides (i.e. nucleotides not essential to discriminate 
between different genotypes of HCV) by others (including modified nucleotides an/or 
mosme). for example, a type 1 or 2 sequence might be modified mto a type 4 sequence by 
replacing some nucleotides of the type 1 or 2 sequence with type-specific nucleotides of type 
4 as shown in Figure 3 (Core region). Figure 4 (Core/El region). Figure 1(7<NS3 / NS4 
region). Figure 14 (E1/E2 region), 

The present invention also relates to a sequence as represented in SEQ ID NO 193 (GB724 
sequence). 

• After aligning NS5 or El sequences of GB48. GB. 116, GB2I5. GB358. GB549 and 
GB809, these isolates clearly segregated into 3 subtypes withm type 4 : GB48, GB116, 
GB215 and GB358 belong to the sybtype designated 4c, GB549 to subtype 4g and GB809 to 
subtype 4e. In NS5. GB809 (subtype 4e) showed a higher nucleic acids homoloev to subcvpe 
4c isolates (85.6 - 86.8%) than to GB549 (subtype 4g. 79.7%), whUe GB549 showed simUar 
homologies to both other subtypes (78.8 to 80% to subtype 4c and 79.7% to subtype 4e). In 
El, subtype 4c showed equal nucleic acid homologies of 75.2% to subtypes 4a and 4e while 
4g and 4e were 78.4% homologous. At the amino acid level howler, subtv^^e 4e showed a 
normal homology to subtype 4c (80.2%). whUe subtype 4g was more homoloaous to 4c 
(83.3%) and 4e (84.1%). 

According to yet another embodiment, the present invention relates to a composition as 
defined above, wherein said polynucieic acids correspond to a nucleotide sequence selected 
from any of the following HCV type 2d genomic sequences: 

- an HCV genomic sequence as having a homology of more than 78 % . preferably more than 
80%. most preferably more dian 82% homology to the sequence as represented in SEQ 
ID NO (NE92) 143 in the region spanning positions 379 to 957 of the Core/El region as 
shown in Figure 4; 

- an HCV genomic sequence as having a homology of more than 74 % . preferably more than 
76%. most preferably more than 78% homology to the sequence as represented in SEQ 
ID NO 143 (NE92) in the region spanning positions 574 to 957 as shown in Figure 4; 
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- an HCV genomic sequence as having a homology of more tlian 87 % , preferably more than 
89%, most preferably more than 91% homology to the sequence as represented in SEQ 
ID NO 145 (NE92) in the region spanning positions 7932 to 8271 of the NS5B region as 
shown in Figure 1. 

Preferentially the above-mentioned genomic HCV sequences depict sequences from the 
coding regions of all the above-mentioned sequences. 

According to the nucleotide distance classification system (with said nucleotide distances 
being calculated as explained above), said sequences of said composition are selected trom: 

- a nucleotide distance of less than 0.32, preferably less than 0.3 1 , more preferably less than 
0,30 for the El region (574 to 957) to any of the above specified sequences;- 

- a nucleotide distance of less than 0.08, preferably less than 0.07, more preferably less than 
0.06 for the Core region (1 to 378) to any of the above given sequences 

- a nucleotide distance of less than 0.15, preferantially less than 0.13, more preferentially 
less than 0.12 for the NS5B region to any of the above-specified sequences. 
Polynucleic acid sequences according-to the present invention which are homologous to the 

sequences as represented by a SEQ ID NO can be characterized and isolated according to any 
of the techniques known in the an, such as amplification by means of type or subt^.^e specirlc 
primers, hybridization with type or subtype specific probes under more or less stringent 
conditions, serological screening methods (see examples 4 and 11) or via the LiPA typing 
system. 

Polynucleic acid sequences of the genomes indicated above from regions not yet depicted 
in the present examples, figures and sequence listing can be obtained by any of the techniques 
known in the art, such as amplification techniques using suitable primers from die ty^De or 
subtype specirlc sequences of the present invention. 

The present invention relates also to a composition as defmed above, wherein said 
polynucleic acid is liable to act as a primer for amplifying the nucleic acid of a certain isolate 
belonging to the genotype from which the primer is derived. 

An example of a primer according to this embodiment of the invention is HCPr 152 as 
shown in table 7 (SEQ ID NO 79). 

The term "primer" refers to a single stranded DNA oligonucleotide sequence capable of 
acting as a point of initiation for synthesis of a primer extension product which is 
complementary to the nucleic acid strand to be copied. Tae length and the sequence of the 
primer must be such that they allow to prime the synthesis of the extension products. 
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Preferably the primer is about 5-50 nucleotides. Specific length and sequence wUl depend on 
the complexity of the required DNA or RNA targets, as well as on die conditions of primer 
use such as temperature and ionic strength. 

The fact dut amplirication primers do cot have to match exactly widi coirespcndin. 
template sequence to warrant proper amplification is amply documented m die literaturl 
(Kwok et al., 1990). 

The amplification mediod used can be eidier polymerase chain reaction (PCR; Saiki et al 
1988). ligase chain reaction (LCR; Undgren et al.. 1988; Wu & Wallace, 1989; Baranv' 
1991). nucleic acid sequence-based amplificauon (NASBA; Guatelli et al.. 1990; Comoton 
1991). transcription-based amplification system (TAS; Kwoh et al.. i989) strand 
displacement amplification (SDA; Duck. 1990; Walker et al., 1992) or ampHfication bv 
means of Q6 replicase (Lizardi er al.. 1988; Umeli et al., 1989) or any odier suitable me±od 
to amphry nucleic acid molecules usmg pruner extension. Durmg amplification, the amplified 
products can be conveoieady labelled either using labelled primers or bv incorporating 
labelled nucleotides. Ubels may be isotopic ^-P, "s. etc.) or non-isotopic (biotin" 
digoxigemn, etc.). Tne amplification reaction is repeated between 20 and 80 times, 
advantageously betw,-een 30 and 50 times. 

The present invention also relates to a composition as defined above, wherein said 
polynucleic acid is able to act as a hybridization probe for specific detection and/or 
classification mto tvpes of a nucleic acid containing said nucleotide sequence, with said 
oligonucleotide being possibly labelled or attached to a solid substrate. 

The term "probe" refers to single stranded sequence-specific oligonucleotides which have 
a sequence which is complementary to die target sequence of die HCV genotype(s) to be 
detected. 

Preferably, Uiese probes are about 5 to 50 nucleotides long, more preferably from about 
10 to 25 nucleotides. 

The term "solid support" can refer to any substrate to which an oligonucleotide probe can 
be coupled, provided that it retains its hybridization characteristics and provided tiiat the 
background level of hybridization remains low. Usually die solid substrate will be a microtiter 
plate, a membrane (e.g. nylon or nitrocellulose) or a microsphere (bead). Prior to application 
to die membrane or fi.xation it may be convenient to modify die nucleic acid probe in order 
to facUitate fixation or improve die hybridization efficiency. Such modifications may 
encompass homopolymer tailmg, coupling widi different reactive groups such as aliphatic 
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groups, NHj groups, SH groups, carboxylic groups, or coupling with biotin or haptens. 

The present invention also relates to the use of a composition as defined above for 
detecting the presence of one or more HCV genotypes, more panicularly for detecting the 
presence of a nucleic acid of any of the HCV genotypes having a nucleotide sequence as 
defined above, present in a biological sample liable to contain them, comprising at least the 
following steps: 

(i) possibly extracting sample nucleic acid, 

(ii) possibly amplifying the nucleic acid with at least one of the primers as defined 
above or any other HCV subtype 2d, HCV type 3. HCV type 4. HCV type 5 
or universal HCV primer, 

(iii) hybrizing the nucleic acids of the biological sample, possibly under denatured 
conditions, and with said nucleic acids being possibly labelled during or after 
amplification, at appropriate conditions with one or more probes as defined above, 
with said probes being preferably attached to a solid subs-ate, 

(iv) washing at appropriate conditions. 

(v) detecting the hybrids formed. 

(vi) inferring the presence of one or more HCV genotypes present from the observed 
hybridization pattern. 

Preferably, this technique could be performed b the Core or NS5B region. 

The term "nucleic acid" can also be referred to as analyte sffand and corresponds to a 
single- or double-stranded nucleic acid molecule. This analyte strand is preferentially positive- 
or negative stranded RNA, cDNA or amplified cDNA. 

The term "biological sample" refers to any biological sample (tissue or fluid) containing 
HCV nucleic acid sequences and refers more panicularly to blood serum or plasma samples. 

The term "HCV subtype 2d primer" refers to a primer which specifically amplifies HCV 
subtype 2d sequences present in a sample (see Examples section and figures). 

The term "HCV type 3 primer" refers to a primer which specifically amplirles HCV type 

3 sequences present in a sample (see Examples section and figures). 

The term "HCV type 4 primer" refers to a primer which specifically amplifies HCV t>T3e 

4 genomes present in a sample. 

The term "universal HCV primer" refers to oligonucleotide sequences complementary to 
any of the conserved regions of the HCV genome. 

The term "HCV type 5 primer" refers to a primer which specifically amplifies HCV tjpe 
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5 genomes present in a sample. TTie term "universal HCV primer" refers to oligonucleotide 
sequences complementary to any of the conserved regions of the HCV genome. 

The expression "appropriate" hybridization and washing conditions are to be understood 
as stringem and are generally known in the an (e.g. Maniatis et al., Molecular Cloning: A 
Uboratory Manual, New York, Cold Spring Harbor Laboratory. 1982). 

However, according to the hybridization solution (SSC, SSPE. etc.). these probes should 
be hybridized at their appropriate temperamre in order lo attain sufficient specificity. 

nie term "labelled" refers to die use of labelled nudeic acids. TTiis may include the use 
of labelled nucleotides incorporated during die polymerase step of the amplificadon such as 
Ulustrated by Saiki et al. (1988) or Bej et al. (1990) or labelled primers, 6r-by any other 
method known to the person skilled in the art. 

The process of the invention comprises the steps of contacting any of the probes as defmed 
above, with one of the following elements: 

either a biological sample in which the nudeic acids are made avaUable for 
hybridization, 

or die purified nucleic acids contained in the biological sample 
or a single copy derived from the purirled nucleic acids, 

or an amplified copy derived from the purified nucleic acids, with said dements or 

with said probes being attached to a solid subsffate. 
The expression "inferring die presence of one or more HCV genotypes present from the 
observed hybridization pattern' refers to the identification of the presence of HCV genomes 
in the sample by analyzijig die pattern of binding of a panel of oligonucleotide probes. Single 
probes may provide useful information concerning die presence or absence of HCV genomes 
in a sample. On the odier hand, die variation of die HCV genomes is dispersed in nature, so 
rarely is any one probe able to idenufy uniquely a specific HCV genome. Radier. the identit>' 
of an HCV genotype may be mferred from die pattern of binding of a panel of 
oligonucleotide probes, which are specific for (different) segments of die different HCV 
genomes. Depending on die choice of diese oligonucleoude probes, each known HCV 
genotype wUl correspond to a specific hybridization pattern upon use of a specific 
combination of probes. Each HCV genorype wUl also be able to be discriminated from any 
odier HCV genotype amplified widi die same primers depending on die choice of die 
oligonucleotide probes. Comparison of die generated pattern of positively hybridizing probes 
for a sample containing one or more unkown HCV sequences to a scheme of expected 
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hybridization patterns, allows one to clearly infer the HCV genocypes present in said sample. 

The present invention thus relates to a method as defined above, wherein one or more 
hybridization probes are selected from any of SEQ ID NO 1. 3. 5, 7. 9. 1 1. 13, 15, 19 
21. 23. 25. 27. 29. 31. 33. 35, 37, 39. 41. 43, 45. 47, 49, 51. 53, 55. 57, 59 or 61, \o6, 
108, 110. 112. 114. 116. 118. 120. 122. 143, 145. 147. 149, 151, 153, 155, 157. 159. 161. 
163, 165, 167, 169, 171. 173. 175, 177, 179, 181. 183, 185. 187, 198. 191, 193, 195, 19^, 
199, 201, 203. 205, 207, 209, 211, 213, 215. 217. 222. 269 or sequence variants diereof,' 
with said sequence variants containing deletions and/or insertions of one or more nucleotides, 
mainly at their extremities (either 3" or 5'), or substitutions of some non-essential nucleotides 
(i.e. nucleotides not essential to discriminate between genotypes) by others (including 
modified nucleotides or inosine). or with said variants consisting of the complement of any 
of the above-mentioned oligonucleotide probes, or with said variants consisting of 
ribonucleotides instead of deoxyribonucleocides, all provided that said variant probes can be 
caused to hybridize with the same specificity as the oligonucleotide probes from which they 
are derived. 

In order to distinguish the amplirled HCV genomes from each other, the target polynucleic 
acids are hybridized to a set of sequence-specific DNA probes targetting HCV genotypic 
regions located in the HCV polynucleic acids. 

Most of these probes target the most type-specific regions of HCV genotypes, but some 
can be caused to hybridize to more than one HCV genotype. 

According to the hybridization solution (SSC, SSPE, etc.), these probes should be 
stringently hybridized at their appropriate temperature in order to attain sufrlcient specificity. 
However, by slightly modifying the DNA probes, eidier by adding or deleting one or a few 
nucleotides at their extremities (either 3" or 5'), or substimting some non-essential nucleotides 
(i.e. nucleotides not essential to discriminate between types) by others (including modified 
nucleotides or inosine) these probes or variants thereof can be caused to hybridize specirlcally 
at the same hybridization conditions (i.e. the same temperature and die same hybridization 
solution). Also changing die amount (concenffation) of probe used may be beneficial to obtain 
more specific hybridization results. It should be noted in this context, that probes of die same 
length, regardless of tiieir GC content, will hybridize specifically at approximately the same 
temperature in TMACl solutions (Jacobs et al., 1988). 

Suitable assay mediods for purposes of the present invendon to detect hybrids formed 
between the oligonucleotide probes and the nucleic acid sequences in a sample may comprise 
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any of the assay formats known in the an, such as the conventional dot-blot format, 
sandwich hybridization or reverse hybridization. For example, the detection can be 
accomplished using a dot blot format, the unlabelled amplified sample being bound to a 
membrane, the membrane being incorporated with at least one labelled probe under suitable 
hybridization and wash conditions, and the presence of bound probe being monitored. 

An alternative and preferred method is a 'reverse" dot-blot format, in which the amplified 
sequence contains a label. In this format, the unlabelled oligonucleotide probes are bound to 
a solid support and exposed to the labelled sample under appropriate stringent hybridization 
and subsequent washing conditions. It is to be understood that also any other assay mediod 
which relies on die formation of a hybrid between the nucleic acids of the sample and the 
oligonucleotide probes according to the present invention may be used. 

According to an advantageous embodiment, the process of detecting one or more HCV 
genotypes contained in a biological sample comprises die steps of contacting amplified HCV 
nucleic acid copies derived from the biological sample, with oligonucleotide probes which 
have been immobilized as parallel lines on a solid support. 

According to this advantageous method, the probes are immobUized in a Lme Probe .\ssay 
(LiPA) format. This is a reverse hybridization format (Saiki et al., 1989) using membrane 
strips onto which several oligonucleotide probes (including negative or positive control 
oligonucleotides) can be conveniendy applied as parallel lines. 

The invention thus also relates to a solid suppon, preferably a membrane strip, carrying 
on its surface, one or more probes as detlned above, coupled to the support in the form of 
parallel lines. 

The LiPA is a very rapid and user-firiendly hybridization test. Results can be read 4 h. 
after the start of the amplification. After amplification during which usually a non-isotopic 
label is incorporated in the amplified product, and alkaline denaniration, the amplified product 
is contacted with the probes on the membrane and the hybridization is carried out for about 
1 to 1.5 h hybridized polynucleic acid is detected. From the hybridization pattern generated, 
the HCV type can be deduced either visually, but preferably using dedicated software. The 
LiPA format is completely compatible with commercially available scanning devices, thus 
rendering automatic interpretation of die results very reliable. All those advantages make the 
LiPA format liable for the use of HCV detection in a routine setting. The LiPA format should 
be panicularly advantageous for detecting the presence of different HCV genotypes. 

The present invention also relates to a method for detecting and identifying novel HCV 
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genotypes, different from the known HCV genomes, comprising the steps of: 

determining to which HCV genotype the nucleotides present in a biological sample 
belong, according to the process as defined above, 

in the case of observing a sample which does not generate a hybridization pattern 
compatible widi those defmed in Table 3, sequencing the portion of die HCV 
genome sequence corresponding to the aberrantly hybridizing probe of the new 
HCV genotype to be determined. 
The present invention also relates to the use of a composition as derlned above, for 

detecting one or more genotypes of HCV present in a biological sample liable to contain 

them, comprising the steps of: 

(i) possibly extracting sample nucleic acid. 

(ii) amplifying the nucleic acid with at least one of the primers as dermed above, 

(iii) sequencing the amplified produce 

(iv) inferring the HCV genct>'pes present frotn the determined sequences by comparison 
to all known HCV sequences. 

The present invention also relates to a compositica consistmg of or comprising at least one 
peptide or polypeptide comprising .a contiguous sequence of at least 5 amino acids 
corresponding to a condguous amino acid sequence encoded by at least one of die HCV 
genomic sequences as defmed above, having at least one amino acid differing from die 
corresponding region of known HCV (type 1 and/or type 2 and'or type 3) polyprotein 
sequences as shown in Table 3, or muteins diereof. 

It is to be noted diat, at die level of the amino acid sequence, an amino acid difference 
(widi respect to known HCV amino acid sequences) is necessary, which means that the 
polypeptides of die invention correspond to polynucleic acids having a nucleotide difference 
(with known HCV polynucleic acid sequences) involving an amino acid difference. 

The new amino acid sequences, as deduced from the disclosed nucleotide sequences (see 
SEQ ID NO 1 to 62 and 106 to 123 and 143 to 218, 223 and 270), show homologies of only 
59.9 to 78% widi prototype sequences of type 1 and 2 for die NS4 region, and of only 53.9 
to 68.8% with prototype sequences of type 1 and 2 for die El region. As die NS4 region is 
known to contain several epitopes, for example characterized in patent application EP-A-0 
489 968, and as die El protein is expected to be subject to immune attack as part of die viral 
envelope and expected to contain epitopes, die NS4 and El epitopes of die new type 3. 4 and 
5 isolates will consistendy differ from die epitopes present in type 1 and 2 isolates. This b 
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exampUfied by the type-specificity of NS4 synthetic peptides as presented in example 4. and 
the type-specificity of recombinant El protenjs in example 1 1. 

After aligning the new subtype 2d. type 3. 4 and 5 (see SEQ ID NO I to 62 and 106 to 
123 and 143 to 218. 223 and 270) amino add sequences with the prototype sequences of type 
la. lb. 2a. and 2b. type- and subtype-specific variable regions can be delineated as presented 
in Figure 5 and 7. 

As to the mmeins derived from die polypeptides of the invention. Table 4 gives an 
overview of die amino acid substitutions which could be die basis of some of the muteins as 
defined above. 

The peptides according to the present invention contain preferably at least 5 contiguous 
HCV amino acids, preferably however at least 8 contiguous amino acids, at least 10 or at 
least 15 (for instance at least 9, 11, 12, 13. 14. 20 or 25 ammo acids) of the new HCV 
sequences of die invention. 
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TABLE 4 



Amino acids Synonymous groups 

SerlS) Ser. Thr. Gly, Asn 

(R) Arg. His, Lys, Glu, Gin 

Leu (L) Leu; He, Met, Phe, Val, Tvr 

P'o (P) Pro. AJa, Tnr, Giy 

no Thr. Pro. Ser. Ala, Gly, His. Gin 

Ala (A) Ala. Pro. Gly, Thr 

CV) Val. Met, lie. Tyr, Phe, Leu, Val 

Gly (G) Gly. Ala. Thr, Pro, Ser 

lis (I) He, Met, Leu. Phe, Val. He, Tyr - 

(F) Phe. Met, Tyr, He, Leu, Trp. Val 

CO Tyr, Phe, Trp, Met, He. Val. Leu 

Gys (C) Cys. Ser. Thr, Met 

His (H) His. Gin, Arg, Lys, Glu. Thr 

Gin (Q) Gin, Glu. His, Lvs, Asn, Thr, Arg 

Asn (N) Asn, Asp, Ser, Gin 

Lys (K) Lys, Arg. Glu, Gin, His 

Asp (D) Asp, Asa. Glu, Gin 

Glu (E) Glu. Gin, Asp, Lys, Asn. His. Arg 

^iet (M) Met. He. Leu, Phe, Val 



The polypeptides of the invention, and particularly the fragments, can be prepared by 
classical chemical synthesis. 

The synthesis can be carried out in homogeneous solution or in solid phase. 

For instance, the syndiesis technique in homogeneous solution which can be used is the one 
described by Houbenweyl in the book entided 'Methode der organischen chemie" (Method 
of organic chemistry) edited by E. Wunsh, vol. 15-1 et II. THIEME, Smttgart 1974. 

The polypeptides of the invention can also be prepared in solid phase according to the 
methods described by Athenon and Shepard in their book entitled "Solid phase peptide 
synthesis" (IRL Press, Oxford. 1989). 

The pol>T5eptides according to this invention can be prepared by means of recombinant 
DNA techniques as described by Maniatis et al.. Molecular Cloning: A Laboratory Manual, 
New York, Cold Spring Harbor Laboratory. 1982). 

The present invention relates particularly to a pol>'peptide or peptide composition as 
defined above, wherein said contiguous sequence contains in its sequence at least one of the 
following amino acid residues: 
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L7, Q43. M44. S60, R67. Q70. T71. A79. A87. N106. K115. A127. A190, S130 V134 
GI42. 1144. E152. A157. V158, P165. S177 or Y177. 1178. V180 or E180 or F182 RI84' 
1186. H187. T189, A190. S191 or G191. Q192 or L192 or 1192 or V192 or E19^ N193 or 
HI93 or PI93. W194 or Y194. H195. A197 or 1197 or VI97 or T197. V:02. 1203 or L^03 
Q208, A210. V212. F214. T2I6. R217 or D217 or E217 or V217. H218 or N218 H^19 or 
V219 or L219. L227 or 1227. M231 or E231 or Q231, T2V. or D232 or A23^ or K23^ 
Q235 or 1235, A237 or T237. 1242. 1246. S247, S248. V249. S250 or Y250 P51 or V^si 
or M251 or F251. D252. T254 or V254. L255 or V255. E256 or A256. M258 or F^58 or 
V258, A260 or Q260 or S260. A261. T264 or Y264. M265. 1266 or A266, A267 G^68 or 
T268, F271 or M271 or V27I. 1277. M280 or H280, 1284 or A284 or 184" V-74 V-9I 
N292 or S292. R293 or 1293 or Y293. Q294 or R294. L297 or 1297 or Q297 A^99 or K^99 
or Q299. N303 or T303. T308 or L308. T3I0 or F310 or A310 or D310 or V310 L313 
G3I7 or Q317. L333. S351. A358. A359. A363. S364. A366. T369. L373. F376. Q386 
087. S392. 1399. F402. 1403, R405. D454. A461, A463, T464, K484, Q500 E501 S.-l 
K522, H524, N528, S531. S532. V534, F536. F537. M539, b-46, CI282 AP83 Hi3I0 
V1312, Q1321, P1368. V1372. V1373. K1405, Q1406, S1409. A1424. A14^9 CI435' 
S1436, S1456, H1496. A1504. D1510. DI529. 11543. N1567. D1556. N1567. M1572 
Q1579. L1581. S1583. F1585, V1595, E1606 or T1606. M1611. V1612 or L1612, P1630. 
C1636. P1651. T1656 or 11656, L1663. V1667. V1677, A1681, H1685. E1687. GI689' 
V1695. A1700, Q1704. Y1705. A1713, AI714 or SI714. M1718, D1719. A1721 orTl721 
R1722. A1723 or V1723. HI726 or G1726. E1730, V1732. F1735, 11736, S1737 R1738 
T1739. G1740. Q1741. K1742. Q1743. A1744, T1745. L1746. E1747 or K1747 11749 
A1750, T1751 or A1751. V1753, N1755. K1756, A1757. P1758. A1759. H1762 T1763 
Y1764. P2645. A2647. K2650. K2653 or L2653. S2664, N2673. F2680. K2681. L2686. 
H2692, Q2695 or L2695 or 12695. V2712. F2715. V2719 or Q2719. T2722. T2724 S2725 
R2726. G2729. Y2735. H2739. 12748, G2746 or 12746. 12748. P2752 or K2752 P275^ or 
T2754, T2757 or P2757, 

with said notation being composed of a letter representing the amino acid residue by its one- 
letter code, and a number representing the amino acid numbering according to Kato et al.. 
1990 as shown in Table 1 (comparison with other isolates). See also the numbering in Figures 
2, 5, 7, and 11 (alignment amino acid sequences). 

Within the group of unique and new amino acid residues of the present invention, the 
following residues were found to be specific for the following types of HC\' according to the 
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HCV classification system used in the present invention: 

Q208, R217. E231. 1235. 1246, T264. 1266. A267, F271, K299. L2686. Q2719 
which are specific for the HCV subtype 2d sequences of die present invention as 
shown in Fig. 5 and 2; 

Q43, S60, R67, F182. 1186. H187. A190. S191, L192, W194, V202, L203, V219, 
Q231. D232. A237, T254, M280. Q299. T303, L308. and/or L313 which are 
specific for die Core/El region of HCV type 3 of die invention as shown in Fig. 
5; 

D1556, Q1579. L1581. S1584. F1585. EI606, V16I2, P1630. C1636, T1656. 
L1663. H1685, E1687, G1689. V1695, Y1705, A1713. A1714. Ai721. V1723. 
H1726, R1738, Q1743, A1744. E1747, II749. A1751, A 1759 and/or H 1762 which 
are specific for die NS3/4 region of HCV type 3 sequences of die invention as 
shown in Fig. 7; 

K2665, D2666, R2670 which are specific for die NS5B region of HCV type 3 of 
die invention as shown in Fig. 2; 

L7, A79. A127, S130. E152, V158. S177 or Y177. V180 or E180. R184. T189, 
Q192 or E192 or 1192, N193 or HI93, 1197 or V197, 1203, A210, V212, E217, 
H218. H219. L227, A232, V249, 1251 or M251. D252, L255 or V255. E256, 
M258 or V258 or F258, A260 or Q260, M265. T268, V271. V274, M280, 1284, 
N292 or S292, Q294. L297 or 1297, T308. A310 or D310 or V310 or T310, and 
G317 which are specific for die core/El region of HCV type 4 sequences of die 
present invention as shown in Fig. 5; 

P2645. K2650, K2653, G2656, V2658, T2668. N2673 or N2673, K2681. H2686, 
D2691. L2692, Q2695 or L2695 or 12695, Y2704. V2712, F2715. V2719. 12722, 
S2725. G2729. Y2735, G2746 or 12746, P2752 or K2752, Q2753. P2754 or 
T2754, T2757 or P2757 which are specific for die NS5B region of die HCV type 
4 sequences of the present invention as shown in Fig. 2; 
M44, Q70,'^7. N106, K115, V137. G142. P165, 1178. F251, A299, N303. Q317 
which are spdlsific for die Core/El region of die HCV type 4 sequences of die 
present inventioia as shown in Fig. 5; 

L333, S351. A358. A359. A363. S364. A366. T369. L373, F376. Q386, 1387, 
S392, 1399. F102, 1403, R405. D454, A461. A463, T464. K484, Q500, E501, 
S521, K522. H524, N528, S532. V534, F537. M539, 1546 which are specific for 
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the E1/E2 region of the HCV type 5 sequences of the present invention as shown 
in Fig. 12; 

CI282. A1283. V13I2. Q1321. P1368. V1372, K1405. Q1406. S1409. A1424. 
A1429, C1435, S1436. S1456. H1496. A1504. DI510, D1529. 11543.' N1567.' 
M1572, V1595. T1606, M1611. L1612. 11656. V1667. A1681, A1700. AUU, 
S1714, M1718. D1719. T1721. R1722. A1723. G1726. F1735. 11736. SI737! 
T1739. G1740. K1742. T1745. LI746. K1747. A1750. V1753. N1755, A1757.' 
D1758, T1763, and YI764 which are specific for the NS3/NS4 region of HCV 
type 5 sequences of the invention as shown in Fig. 7; 

A2647. L2653, S2674. F2680, T2724, R2726. Y2730. H2739 whicfa-are specific 
for the NS5B region of the HCV type 5 sequences of the present invention as 
shown in Fig. 2; 

A256. P1631. V1677. Q1704. E1730. V1732. QI741 andT1751 which are specific 
for the HCV type 3 and 5 sequences of the present invention as shown in Fig. 5 
and 7; 

T71, A157, 1227. T237. T240, Y250, V25I, S260, M271. T2673, T2722. 12748 
which are specific for the HCV type 3 and 4 sequences of the present invention as 
shown in Fig. 5 and 2, 

V192. Y194. AI97, P249. S250. R294 which are specific for the HCV type 4 and . 
5 sequences of the present invention as shown in Fig. 5; 
1293 which is specific for the HCV type 4 and subtype 2d sequence of the present 
invention as shown in Fig. 5; 

D217 and R294 which are specific for the HCV type 3. 4 and 5 sequences of the 
present invention as shown in Fig. 5; 

L192 which is specific for die HCV type 3 and subtype 2d sequences of the present 
invention as shown in Fig. 5; 

G191 and T197 which are specific for the HCV type 3, 4 and subtype 2d sequences, 
of the present invention as shown in Fig. 5; 

K232 which is specific for the HCV subtype 2d en type 5 sequences of the present 

invention as shown in Fig. 5. 
and widi said notation being composed of a letter, unambiguously representing the amino acid 
by its one-letter code, and a number representing the amino acid numbering according to Kato 
et al., 1990 (see also Table 1 for comparison with odier isolates), as well as Figure 2 (NS5 
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region), Figure 5 (Core/El region). Figure 7 (NS3/NS4 region). Figure 12 (E1/E2 region). 
Some of the above-mentioned amino acids may be contained in type or subtype specific 
epitopes. 

For example M23 1 (detected in type 5) refers to a mediionine at position 23 1. A gluiamine 
(Q) is present at the same position 231 in type 3 isolates, whereas this position is occupied 
by an arginine in type 1 isolates and by a lysine (K) or asparagine (N) in type 2 isolates (see 
Figure 5). 

The peptide or polypeptide according to diis embodiment of the invention may be possibly 
labelled, or attached to a solid substrate, or coupled to a carrier molecule such as biotin, or 
mixed widi a proper adjuvant. 

The variable region in the core protein (V-CORE in Fig. 5) has been shown to be useful 
for serotyping (Machida et al., 1992). The sequence of the disclosed type 5 sequence in this 
region shows type-specific features. The peptide from amino acid 70 to 78 shows the 
following unique sequence for the sequences of the present inevntion (see figure 5): 

QPTGRSWGQ (SEQ ID NO 93) 

RSEGRTSWAQ (SEQ ID NO 220) 

and RTEGRTSWAQ (SEQ ID NO 221) 
Another preferred V-Core spanning region is the peptide spanning positions 60 to 78 of 
subtype 3c with sequence: 

SRRQPIPR.\RRTEGRSWAQ (SEQ ID NO 268) 

Five type-specific variable regions (VI to V5) can be identiiled after aligning El amino 
acid sequences of the 4 genotypes, as shown in Figure 5. 

Region VI encompasses amino acids 192 to 203. tiiis is the amino-terminal 10 amino acids 
of the El protein. The following unique sequences as shown in Fig. 5 can be deduced: 

LEWRNTSGLYVL (SEQ ID NO 83) 

VNYRNASGIYHI (SEQ ID NO 126) 

QHYRNISGIYHV (SEQ ID NO 127) 

EHYRNASGIYHI (SEQ ID NO 128) 

IHYRNASGIYHI (SEQ ID NO 224) 

VPYRNASGIYHV (SEQ ID NO 84) 

VNYRNASGIYHI (SEQ ID NO 225) 

VNYRNASGVYHI (SEQ ID NO 226) 

VNYHNTSGIYHL (SEQ ID NO 227) 
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QHYRNASGIYHV (SEQ ID NO 228) 

QHYRNVSGIYHV (SEQ ID NO 229) 

IHYRNASDGYYI (SEQ ID NO 230) 

LQVKNTSSSYMV (SEQ ID NO 231) 
Region v: encompasses amino acids 213 to 223. TTie following unique sequences can be 
found in the V2 region as shown in Figure 5: 
VYEADDVILHT (SEQ ID NO 85) 
VYETEHHILHL (SEQ ID NO 129) 
VYEADHHIMHL (SEQ ID NO 130) 
VYETDHHILHL (SEQ ID NO 131) 
VYEADNLILHA (SEQ ID NO 86) 
VWQLRAFVLHV (SEQ ID NO 232) 
VYE.\DYHILHL (SEQ ID NO 233) 
VYETDNHILHL (SEQ ID NO 234) 
VYETENHILHL (SEQ ID NO 235) 
VFETVHHILHL (SEQ ID NO 236) 
VFETEHHILHL (SEQ ID NO 237) 
VFETDHHLMHL (SEQ ID NO 238) 
VYETENHILHL (SEQ ID NO 239) 
VYEADALILHA (SEQ ID NO 240) 

Region V3 encompasses the amino acids 230 to 242. The following unique V3 region 
sequences can be deduced from Figure 5: 
VQDGNTSTCWTPV (SEQ ID NO 87) 
VQDGiNTSACWTPV (SEQ ID NO 241) 
VRVGNQSRCWVAL (SEQ ID NO 132) 
VRTGNTSRCWVPL (SEQ ID NO 133) 
VRAGNVSRCWTPV (SEQ ID NO 134) 
EEKGNISRCWIPV (SEQ ID NO 242) 
VKTGNQSRCWVAL (SEQ ID NO 243) 
VRTGNQSRCWVAL (SEQ ID NO 244) 
VKTGNQSRCWIAL (SEQ ID NO 245) 
VKTGNVSRCWIPL (SEQ ID NO 247) 
VKTGNVSRCWISL (SEQ ID NO 248) 
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VRKDNVSRCWVQI (SEQ ID NO 249) 

Region V4 encompasses the amino acids 248 to 257. The foUowing unique V4 region 
sequences can be deduced from figure 5: 

VRYVGATTAS (SEQ ID NO 89) 

APYIGAx^LES (SEQ ID NO 135) 

APY^/GAPLES (SEQ ID NO 136) 

AVSMDAPLES (SEQ ID NO 137) 

APSLGAVTAP (SEQ ID NO 90) 

APSFGAVTAP (SEQ ID NO 250) 

VSQPGALTKG (SEQ ID NO 251) 

VKYVGATTAS (SEQ ID NO 252) 

APYIGAPVES (SEQ ID NO 253) 

AQHLNAPLES (SEQ ID NO 254) 

SPYVGAPLEP (SEQ ID NO 255) 

SPYAGAPLEP (SEQ ID NO 256) 

APYLGAPLEP (SEQ ID NO 257) 

APYLGAPLES (SEQ ID NO 258) 

APYV'GAPLES (SEQ ID NO 259) 

VPYLGAPLTS (SEQ ID NO 260) 

APHLRAPL5S (SEQ ID NO 261) 

APYLGAPLTS (SEQ ID NO 262) 
Region V5 encompasses the amino acids 294 to 303. The following unique V5 region 
peptides can be deduced from figure 5: 

RPRRHQTV'QT (SEQ ID NO 91) 

QPRRHWTTQD (SEQ ID NO 138) 

RPRRHWTTQD (SEQ IID NO 139) 

RPRQHATVQN (SEQ ID NO 92) 

RPRQHAT\'QD (SEQ ID NO 263) 

SPQHHKFVQD (SEQ ID NO 264) 

RPRRLWTTQE (SEQ ID NO 265) 

PPRIHETTQD (SEQ ID NO 266) 

The variable region in the E2 region {HVR-2) of type 5a as shown in Figure 12 spanning 
amino acid positions 471 to 484 is also a preferred peptide according to the present invention 
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with the following sequence: 
TISYANGSGPSDDK (SEQ ID NO 267) 

The above given list of peptides are panicularly suitable for vaccine and diagnostic 
development. 

Also comprised m the present invention is any synthetic peptide or polypeptide containin<. 
at least 5 contiguous amino acids derived from the above-defined peptides in their peptidil 
chain. 

According to a specific embodiment, the present invention relates to a composition as 
defined above, wherein said contiguous sequence is seleaed from any of the following HCV 
amino acid type 3 sequences: 

- a sequence having a homology of more than 12%, preferably more than 74%. more 
preferably more than 77% and most preferably more than 80 or 84% homology to anv of 
the amino acid sequences as represented in SEQ ID NO 14, 16. 18. 20. 22, 24. 26 or 28 
(HDIO, BR36, BR33 sequences) in the region spanning positions 140 to 319 in the 
Core/El region as shown in Figure 5; 

- a sequence having a homology of more than 70%. preferably more than 72%. more 
preferably more than 75% homology, most preferably more than 81% homology to any 
of the amino acid sequences as represented in SEQ ID NO 14, 16, 18, 20, 22, 24. 26 or 
28 (HDIO. BR36. BR33 sequences) in the EI region spanning positions 192 to 319 as 
shown in Figure 5; 

- a sequence having a homology of more dian 86%, preferably more than 88%, and most 
preferably more than 90% homology to d.e amino acid sequences as represented in SEQ 
ID NO 148 (type 3c); BE98 in the region spanning positions 1 to 1 10 in the Core region 
as shown in Figure 5; 

- a sequence having a homology of more than 76%, preferably more than 78%. most 
preferably more than 80% to any of the amino acid sequences as represented in SEQ ID 
NO 30, 32. 34. 36. 38 or 40 (HCC153, HDIO. BR36 sequences) in" the region spanniiig 
positions 1646 to 1764 in the NS3/NS4 region as shown in Figure 7 and 11; 

- a sequence having a homology of more than 81%. preferably more than 83%. and most 
preferably more than 86% homology to any of the amino acid sequences as represented 
in SEQ ID NO 14. 16, 18, 20, 22, 24, 26 or 28 (HDIO, BR36. BR33 sequences) in the 
region spanning positions 140 to 319 in the Core/El region as shown in Figure 5; 

- a sequence having a homology of more than 81.5%. preferably more than 83%. and most 
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preferably more than 86 % homology to any of the amino acid sequences as represented 
in SEQ ID NO 14. 16, 18, 20, 22, 24, 26 or 28 (HDIO, BR36, BR33 sequences) in the 
El region spanning positions 192 to 319 as shown in Figure 5; 
- a sequence having a homology of more than 86%, preferably more than 88%, most 
preferably more than 90% to the amino acid sequence as represented in SEQ ID NO 150; 
(type 3c BE98) in the region spanning positions 2645 to 2757 in the NS5B region as shown 
in Figure 2. 

According to yet another embodiment, the present invention relates to a composition as 
defined above, wherein said contiguous sequence is selected from any of the following HCV 
amino acid type 4 sequences: 

a seqi^fince having a homology of more than 80%, preferably more than 82%. most 
preferaiy^Y more than 84% homology to any of the amino acid sequences as represented 
in SEQ IBl NO 118, 120, and 122 (GB358, GB549, GB809 sequences) in the region 
spanning poskions 127 to 319 of the Core/El region as shown in Figure 5; 
a sequence havsmg a homology of more than 73%, preferably more than 75%. most 
preferably more tWi 78 % homology in the El region spanning positions 192 to 319 to any 
of the amino acid kquences as represented in SEQ ID NO 118, 120, and 122 (GB358, 
GB549, GB809 sequfi^nces) in the region spanning positions 140 to 319 of the Core/El 
region as shown in FigVe 5; 

a sequence having more man 85%, preferably more than 86%, most preferably more than 
87% homology to any of thevamino acid sequences as represented in SEQ ID NO 118, 120 
or 122 (GB358, GB549, GB8Q^ sequences) in the region spanning positions 192 to 319 of 
El as shown in Figure 5; 

a sequence showing more than 73 preferably more than 74% , most preferably more than 
75% homology to any of the aminci acid sequences as represented in SEQ ID NO 106, 
108, 110, 112, 114 or 116 (GB48. GB116, GB215. GB358, GB549, GB809 sequences) 
in the region spanning positions 2645 ta2757 of the NS5B region, as shown in Figure 2; 
a sequence having any of the sequences asVepresented in SEQ IE) NO 164 or 166 (GB809 
and CAM600 sequences) in the Core/El region as shown in Figure 5; 
a sequence having any of the sequences as represented in SEQ ID NO 168. 170, 172, 174, 
176, 178, 180, 182, 184, 186, 188 or 190 (0^4600, GB809. CAMG22, C.\MG27, 
GB549, GB438, CAR4/1205, CAR4/901, GB116>QB215, GB958, GB809-4 sequences) 
in the El region as shown in Figiu-e 5; 
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- a sequence haW any of the sequences as represented in SEQ ID NO 192, 194 196 198 
200. 202, 204,W. 208, 210, 212 (GB358. GB724, BEIOO. PC, C.M600, CAMG- 
eic.) ii the NS5^ region. 

The abovc-memioncd type 4 peptide, polypeptides comprise at least a„ .„i„o acid 
«,ue„ce selected fro. any HCV type 4 polyprotein with the e^tception of core sequence as 
disclosed by Simmonds et ai. (1993, EG-29, see Figtire 5). 

According to yet at>od.er aspea, the present invention relates to a composition as defmed 
above, wherein s^d contiguous sequence is selected from any of the following HCV amino 
acid type 5 sequences: 

- a sequence having more than 93%, preferably more than 94%. most preferably more than 
95% homology in the region spamiing Core positions 1 to 191 to anv of the amino acid 
sequences as represented m SEQ ID NO 42, 44. 46. 48 . 50. 52 or 54 (PC sequences) and 
SEQ ID NO 152 (BE95) as shown in Fig'ire 5; 

a sequence having more than 73%, preferably more than 74%. most preferably 
more than 76% homology in the region spamnng El positions 192 to 319 to any 
of the amino acid sequences as represented in SEQ ID NO 42. 44, 46. 48. 50, 52 
or 54 (PC sequences) as shown in Figure 5 ; 

- a sequence having a more than 78%. preferably more than 80%. most preferably more 
than 83% homology to any of the amino acid sequences as represented in SEQ ID NO 4^ 
44. 46. 48. 50. 52, 54. 154. 156 (BE95, BEIOO) (PC sequences) in the reaion spanning 
positions 1 to 3 19 of the Core/El region as shown in Figure 5; 

a sequence^ving more than 90%. preferably more than 91 %. most preferably more ±^n 
92% homol^ey to any of the amino acid sequences represented in SEQ ID NO 56 to 58 
(PC sequences\ in die region spamiing posinons 1286 to 1403 of the NS3 region as shown 
in Figure 7 or l\; 

- a sequence having more than 66%, more panicularly 68%. most particularly 70% or more 
homology to any of the amino acid sequences as represented in SEQ ID NO 60 or 62 (PC 
sequences) in the region spanning positions 1646 to 1764 of the NS3/4 region as shown 
in Figure 7 or 11. 

According to yet anodier embodiment, the present invention relates to a 
composition as defmed above, wherein said contiguous sequence is selected from any of 
the following HCV amino acid type 2d sequences: 

- a sequence having more than 83%. preferably more dian 85%. most preferably more than 
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87% homology to ihe amino acid sequence as represented in SEQ ID NO 144 (NE92) in 
the region spanning positions 1 to 319 of the Core/El region as shown in Figure 5; 

- a sequence having more than 79%. preferably more than 81 %, most preferably more than 
84% homology in the region spanning El positions 192 to 319 to the amino acid sequence 
as represented in SEQ ID NO 144 (Nt92) as shown in Figure 12; 

- a sequence having more than 95 % , more particularly 96 % , most panicularly 97 % or more 
homology to the amino acid sequence as represented in SEQ ID NO 146 (NE92) in the 
region spanning positions 2645 to 2757 of the NS5B region as shown in Figure 2. 

The present invention also relates to a recombinant vector, particularly for cloning and/or 
expression, with said recombinant veaor comprising a vector sequence, an" appropriate 
prokaryoiic, eukaryotic or viral promoter sequence followed by the nucleotide sequences as 
defined above, with said recombinant vector allowing the expression of any one of the HCV 
type 2 and/or HCV type 3 and/or type 4 and/or type 5 derived polypeptides as defined above . 
in a prokaryotic. or eukaryotic host or in living mammals when injected as naked DNA, and 
more particularly a recombinant vector allowing the expression of any of die followins HCV 
type 2d, type 3, type 4 or type 5 polypeptides spanning the foUowiag amino acid positions: 
a polypeptide staning at position 1 and ending at any position in the region between 
positions 70 and 326. more panicularly a polypeptide spanning positions 1 to 70, 
1 to 85, positions 1 to 120, positions 1 to 150, positions 1 to 191, positions 1 to 
200, for expression of the Core protein, and a polypeptide spanning positions 1 to 
263, positions 1 to 326, for expression of the Core and El protein; 
a polypeptide starting at any position in the region between positions 1 17 and 192, 
and ending at any position in the region between positions 263 and 326, for 
expression of El, or forms that have the putative membrane anchor deleted 
(positions 264 to 293 plus or minus 8 amino acids); 

a polypeptide staning at any position in the region between positions 1556 and 
1688, and ending at any position in the region between positions 1739 and 1764, 
for expression of the NS4 regions, more particularly a polypeptide starting at 
position 1658 and ending at position 171 1 for expression of the NS4a antigen, and 
more particularly, a polypeptide staning at position 1712 and ending between 
positions 1743 and 1972, for example 1712-1743, 1712-1764, 1712-1782, 1712- 
1972, 1712 to 1782 and 1902 to 1972 for expression of the NS4b protein or parts 
thereof. 
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The term "vector' may comprise a plasmid, a cosmid. a phage, or a virus. 
In order to carry out the expression of the polypeptides of the invention in bacteria such 
as E. coil or in eukar>'otic cells such as in S. cerevisiae, or in cultured vertebrate or 
invenebrate hosts such as insect ceils, Chinese Hamster Ovary (CHO), COS. BHK, and 
MDCK cells, die following steps are carried out: 

transformation of an appropriate cellular host widi a recombinant vector, in which 
a nucleotide sequence coding for one of the polypeptides of the invention has been 
inserted under the control of the appropriate regulatory elements, panicularly a 
promoter recognized by the polymerases of the cellular host and, in the case of a 
prokaryotic host, an appropriate ribosome binding site (RBS),' enabling the 
expression in said cellular host of said nucleotide sequence. In the case of an 
eukaryotic host any artificial signal sequence or pre/pro sequence might be 
provided, or the natural HCV signal sequence might be employed, e.g. for 
expression of El the signal sequence starting between amino acid positions 1 17 and 
170 and ending at amino acid position 191 can be used, for expression of NS4, the 
signal sequence starting between amino acid positions 1646 and 1659 can be used, 
culture of said transformed cellular host under conditions enabling the expression 
of said insert. 

The present mvention also relates to a composition as defmed above, wherein said 
polypeptide is a recombinant polypeptide expressed by means of an expression vector as 
defmed above. 

The present invention also relates to a composition as defined above, for use in a method 
for immunizing a mammal, preferably humans, against HCV comprising administring a 
sufficient amount of the composition possibly accompanied by pharmaceutically acceptable 
adjuvants, to produce an immune response, more particularly a vaccine composition including 
HCV type 3 polypeptides derived from the Core, El or the NS4 region and/or HCV type 4 
and/or HCV type 5 polypeptides and/or HCV type 2d pol>'peptides. 

The present invemion also relates to an antibody raised upon immunization with a 
composition as defined above by means of a process as defined above, with said antibody 
being reactive widi any of the polypeptides as defined above, and with said antibody being 
preferably a monoclonal antibody. 

The monoclonal antibodies of the invention can be produced by any hybridoma liable 
to be formed according to classical methods from splenic cells of an animal, particularly from 
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a mouse or rat, immunized against the HCV polypeptides according to the invention, or 
muteins thereof, or fragments thereof as defmed above on the one hand, and of cells of a 
myeloma cell line on the other hand, and to be selected by the ability of the hybridoma to 
produce the monoclonal antibodies recognizing the polypeptides which has been initially used 
for the immunization of the animals. 

The antibodies involved in the invention can be labelled by an appropriate label of die 
enzymadc, fluorescent, or radioactive type. 

The monoclonal antibodies according to this preferred embodiment of the invention may 
be humanized versions of mouse monoclonal antibodies made by means of recombinant DNA 
technology, depaning from pans of mouse and/or human genomic DNA sequerxces codins 
for H and L chains, or from cDNA clones coding for H and L chains. 

Alternatively the monoclonal antibodies according to this preferred embodiment of the 
invention may be human monoclonal antibodies. These antibodies according co me present 
embodiment of the invention can also be derived from human peripheral blood lymphocytes 
of patients infected with type 3. type 4 or r>pe 5 HCV, or vaccinated against HCV. Such 
human monoclonal antibodies are prepared, for instance, by means of human peripheral blood 
lymphocytes (PBL) repopulation of severe combined immune deficiency (SCID) mice (for 
recent review, see Duchosal et al. 1992). 

The invention also relates to die use of the proteins of die invention, muteins thereof, or 
peptides derived therefrom for the selection of recombinant antibodies by the process of 
repenoire cloning (Persson et al., 1991). 

Antibodies directed to pepudes derived from a certaing genotype may be used either for 
the detection of such HCV genotypes, or as therapeutic agents. 

The present invention also relates to die use of a composition as der'med above for 
incorporation into an immunoassay for detecting HCV, present in biological sample liable to 
contain it, comprising at least die following steps: 

(i) contacting die biological sample to be analyzed for the presence of HCV antibodies 
widi any of the compositions as defmed above preferably in an immobilized form 
under appropriate conditions which allow the formation of an immune complex, 
wherein said polypeptide can be a biocinylated polypeptide which is covalently 
bound to a solid substrate by means of strep tav id in or avidin complexes, 

(ii) removing unbound components, 

(iii) incubating the immune complexes formed with heterologous antibodies, which 



SUBSTITUTE SHEET (RULE 26) 



^VO 9^ /2 i? ^01 

PCT/EP94/0D23 

44 

specificaUy bind to the antibodies present in the sample to be analyzed, with said 
heterologous antibodies having conjugated to a detectable label under appropriate 
conditions, 

(iv) detecting the presence of said immunecomplerjs visually or by means cf 
densitometry and inferring the HCV serotype present from the observed 
hybridization pattern. 

The present invention also relates to the use of a composition as denned above, for 
incorporation into a serotyping assay for detecting one or more serological types of HCV 
present in a biological sample liable to contain it, more paniculariy for detectmg El and NS4 
antigens or antibodies of the different types to be detected combined m one" assay format, 
comprising at least the following steps: 

(i) contacting die biological sample to be analyzed for die presence of HCV antibodies 
or antigens of one or more serological types, with at least one of the compositions 
as defined above, an immobUized form under appropriate conditions which allow 
the formation of an Lmmunecomplex, 

(ii) removing unbound components, 

(iii) incubating die immunecomplexes formed with heterologous antibodies, which 
specifically bind to the antibodies present in die sample to be analyzed, with said 
heterologous antibodies having conjugated to a detectable label under appropriate 
conditions, 

(IV) detecting die presence of said immunecomplexes visually or by means of 
densitometry and inferring die presence of one or more HCV serological types 
present from the observed binding pattern. 
Tne present invention also relates to die use of a composition as defined above, for 
immobilization on a solid substrate and incorporation into a reversed phase hybridization 
assay, preferably for immobilization as parallel lines onto a solid support such as a membrane 
strip, for determining die presence or die genotype of HCV according to a mediod as defined 
above. 

The present invention dius also relates to a kit for determining die presence of HCV 
genotypes as defined above present in a biological sample liable to contain diem, comprising: 
possibly at least one primer composition containing any primer selected from diose 
defined above or any odier HCV type 3 and/or HCV type 4. and/or HCV type 5. 
or universal HCV primers. 
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at least one probe composition as defmed above, with said probes being 
preferentially immobilized on a solid substrate, and more preferentially on one and 
the same membrane saip, 

a buffer or components necessary for producing the buffer enabling hybridization 
reaction between these probes and the possibly amplified products to be carried out, 
means for detecting the hybrids resulting from the preceding hybriziation, 
possibly also including an automated scanning and interpretation device for 
inferring the HCV genotypes present in the sample from the observed hybridization 
pattern. 

The genotype may also be detected by means of a cype-specific antibody as der^med above, 
which is linked to any polynucleotide sequence that can afterwards be amplified by PGR to 
detect the immune complex formed (Immuno-PCR, Sane et al., 1992); 

The present invention also relates to a kit for determining the presence of HCV antibodies 
as defmed above present in a biological sample liable to contain them, comprising: 

at least one polypeptide composition as defmed above, preferentially in combination 
with other polypeptides or peptides from HCV type I, HCV type 2 or other types 
of HCV, with said polypeptides being preferentially immobilized on a solid 
substrate, and more preferentially on one and the same membrane strip, 
a buffer or components necessar/ for producing the buffer enabling binding 
reaction between these poI}'peptides and the antibodies against HCV present in the 
biological sample, 

means for detecting the immunecomplexes formed in the preceding binding 
reaction, 

possibly also including an automated scanning and interpretation device for 
inferring the HCV genot\Tes present in the sample from the observed binding 
pattern. 
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Figure Leg ends; 

Figure 1 

Alignment of consensus nucleotide sequences for each of the type 3a isolates BR34. BR36. 
and BR33, deduced from the clones with SEQ ID NO 1. 5. 9; type 4 isolates GB48. GBl 16 
GB215. GB358. GB549, GB809. CAM600. C.\MG22, G3438. CAR4/1205. CAR1/50I 
(SEQ ID NO. 106. 108, 110, 112, 114. 116, 201. 203. 205. 207. 209 and 211); type 5a 
isolates BE95 and BE96 (SEQ ID NO 159 and 161) and type 2d isolate NE92 (SEQ ID NO 
145) from the region between nucleotides 7932 and 8271. with known sequences from die 
corresponding region of isolates HCV-1, HCV-J. HC-J6, HC-J8, Tl and T9: and others as 
shown in Table 3. 

Figure 2 

Alignment of ammo acids sequences deduced from the nucleic acid sequences as 
represented in Figure 1 from die subtype 3a clones BR34 (SEQ ID NO 2. 4), BR36 (SEQ ID 
NO 6. 8) and BR33 (SEQ ID NO 10, 12). the subtype 3c clone BE98 (SEQ ID NO 150). and 
the type 4 clones GB48 (SEQ ID NO 107), G3116 (SEQ ID NO 109), GB215 (SEQ ID NO 
111). GB358 (SEQ ID NO 113). GB549 (SEQ ID NO 115) GB809 (SEQ ID NO 117); 
CAM600. CAMG22. GB438. CAR4/1205,, CAR1/50I (SEQ ID NO 202, 204, 206. 208,' 
210. 212); the type 5a clones BE95 and BE96 (SEQ ID NO 160 and 162); as well as the 
subtype 2d isolate NE92 (SEQ ID NO 146) from the region between amino acids 2645 to 
2757 with known sequences from the corresponding region of isolates HCV-I, HCV-J, HC- 
J6, and HC-J8. Tl and T9, and other sequences as shown in Table 3. 

Figure 3 

Aligment of type 2d. 3c, 4 and 5a nucleotide sequences from isolates NE92, BE98, 
GB358. GB809. CAM600. GB724. BE95 (SEQ ID NO 143, 147. 191. 163. 165. 193 and 
15 1) in the Core region between nucleotide positions 1 and 500. with known sequences from 
the corresponding region of type I, type 2. type 3 and type 4 sequences. 

Figure 4 

Alignment of nucleotide sequences for the subtype 2d isolate NE92 (SEQ ID NO 143), the 
type 4 isolates GB358 (SEQ ID NO 118 and 187), GB549 (SEQ ID NO 120 and 175). and 
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GB809-2 (SEQ ID NO 122 and 169). GB 8094. BG116, GB215. CAM600. CAMG22, 
CAMG27. GB438, CAR4/1205, CAR4/901 (SEQ ID NO 189, 183. 185. 167. 171. 173. 177' 
179. 181), sequences for each of the subtype 3a isolates HDIO, BR36, and BR33, (SEQ ID 
NO 13. 15. 17 (HDIO). 19, 21 (BR36) and 23 . 25 or 27 (BR23) and the subtype 5a isolates 
BE95 and BEIOO (SEQ ID NO 143 and 195) from the region between nucleotides 379 and 
957, with known sequences from the corresponding region of type 1 and 2 and 3. 

Figure 5 

Alignment of amino acid sequences deduced from the new KCV nucleotide sequences of 
the Core/El region of isolates BR33. BR36, HDIO. GB358. GB549. and GB8G9, PC or 
BE95. C\M600. and GB724 (SEQ ID NO. 14, 20. 24, 119 or 192. 121, 123 or 164, 54 or 
152, 166 and 194) from the region between posiuons 1 and 319. with known sequences from 
type la (HCV-l), type lb (HCV-J). vsvt 2a (HC-JG), type 2b (HC-J8). NZLl, HCV-TR, 
positions 7-89 of type 3a (E-bl), and positions 8-88 of type 4a (EG-29). V-Core, variable 
region with type-specific features in the core protein, VI , variable region 1 of the El protein, 
V2. variable region 2 of the El protein, V3. variable region 3 of the EI protein, V4. variable 
region 4 of the El protein. V5, variable region 5 of the El protein. 

Figure 6 

Alignment of nucleotide sequences of isolates HCCL53, HDIO and BR36. deduced from 
clones with SEQ ID NO 29, 31, 33, 35, 37 and 39. from the NS3/4 region between 
nucleotides 4664 to 5292. with known sequences from the corresponding region of isolates 
HCV-l, HCV-J. HC-J6, and HC-J8. EBl. EB2, EB6 and EB7. 

Figure 7 

Alignment of amino acid sequences deduced from the new HCV nucleotide sequences of 
the NS3/NS4 region of isolate BR36 (SEQ ID NO 36, 38 and 40) and BE95 (SEQ ID NO 
270). NS4-1, indicates the region that was synthesized as synthetic peptide 1 of the NS4 
region, NS4-5, indicates the region that was synthesized as synthetic peptide 5 of the NS4 
region; NS4-7. indicates the region that was synthesized as synthetic peptide 7 of the NS4 
region. 

Figure 8 
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Reactivity of the three LIPA-selected (Stuyver et al.. 1993) type 3 sera on the Inno-LIA 
HCV Ab II assay (Innogenetics) Oeft). and on the NS4-LIA test. For the NS4-LIA test, NS4. 
I. NS4-5. and NS4-7 peptides were synthesized based on the type 1 (HCV-1). type 2 (HC-J6) 
and type 3 (3R36) prototype isolate sequences as shown in Table 4, and applied as parallel 
lines onto a membrane strip as indicated. 1. serum BR33. 2. sennn HDIO, 3, serum DKH. 

Figure 9 

Nucleotide sequences of Core/El clones obtained from the PCR fragments PC-2. PC-3 
and ?CA, obtained from serum BE95 (PC-2-1 (SEQ ID NO 41), PC-2-6 (SEQ ID NO 43) 
PC-4-l (SEQ ID NO 45), PC-4-6 (SEQ ID NO 47). PC-3-4 (SEQ ID NO 49): and PC-3-8 
(SEQ ID NO 51)) of subtype 5a isolate BE95. 

A consensus sequence is shown for the Core and El region of isolate BE95, oresented as 
PC C/El with SEQ ID NO 53. Y. C or T. R. A or G. S, C or G. 

Figure 10 

Alignment of nucleotide sequences of clones widi SEQ ID NO 197 and 199 (PC sequences 
see also SEQ ID NO 55. 57. 59) and SEQ ID NO 35. 37 and 39 (BR36 sequences) from the 
NS3/4 region between nucleotides 3856 to 5292. with known sequences from the 
corresponding region of isolates HCV-1, HCV-J. HC-J6. and HC-J8. 

Figure 1 1 

Aligmnent of amino acid sequences of subtype 5a BE95 isolate PC clones with SEQ ID 
NO 56 and 58. from the NS3/4 region between amino acids 1286 to 1764, with known 
sequences from the corresponding region of isolates HCV-1, HCV-J, HC-J6. and HC-J8. 

Figure 12 

Aligment of amino acid sequences of subtype 5a isolate BE95 (SEQ" ID NO 158) in the 
E1/E2 region spamiing positions 328 to 546. with known sequnces from the corresponding 
region of isolates HC\--1. HCV-J. HC-J6. HC-J8, NZLl and HCV-TR (see Table 3). 

Figure H 

Alignment of the nucleotide sequences of subtype 5a isolate BE95 (SEQ ID NO 157) in 
the E1/E2 region with known HCV sequences as shown in Table 3. 
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EXAMPLES 



Example 1: The NS5b region of HCV tvpe 3 

Type 3 sera, selected by means of the INNO-LiPA HCV research kit (Stuyver et al., 1993) 
from a number of Brazilian blood donors, were positive in the HCV antibodv ELISA 
(Innotest HCV Ab II; Innogenecics) and/or in the INNO-LIA HCV Ab II confirmation test 
(Innogenetics). Only those sera that were positive after the first round of PCR reactions 
(Stuyver et al., 1993) were retained for funher study. 

Reverse transcription and nested PCR: RNA was extracted from 50 serum and subjected 
to cDNA synthesis as described (Stuyver et al., 1993). Tnis cDNA was used as template for 
PCR, for which the total volume was increased to 50 fil containing 10 pmoles of each primer, 
3 fi\ of lOx Pfu buffer 2 (Scracagene) and 2.5 U of Pfu DNA polymerase (Stratagene). The 
cDNA was amplirled over 45 cycles consisting of 1 mm 94 'C, 1 min 50 'C and 2 min 72' C, 
The amplified products were separated by eleca-ophoresis. isolated, cloned and sequenced as 
described (Stuyver et al., 1993). 

Type 3a and 3b-specific primers in the NS5 region were selected from the published 
sequences (Mori et al., 1992) as follows: 
for type 3a: 

HCPrl61(+): 5'-ACCGGAGGCCAGGAGAGTGATCTCCTCC-3^ (SEQ ID NO 63) and 
HCPrl62(-): 5'-GGGCTGCTCTATCCTCATCGACGCCATC-3^ (SEQ ID NO 64); 
for type 3b: 

HCPrl63(+): 5'-GCCAGAGGCTCGGAAGGCGATCAGCGCT-3' (SEQ ID O 65) and 
HCPrI64(-): 5^-GAGCTGCTCTGTCCTCCTCGACGCCGCA-3' (SEQ ID NO 66) 
Using the Line Probe Assay (LiPA) (Stuyver et al., 1993), seven high-titer type 3 sera 
were selected and subsequently analyzed with the primer sets HCPrl61/162 for type 3a, and 
HCPrl63/164 for type 3b. None of these sera was positive with the t>npe 3b primers. NS5 
PCR fragments obtained using the type 3a primers from serum BR36 (BR36-23), serum BR33 
(BR33-2) and serum BR34 (BR34-4) were selected for cloning. The following sequences were 
obtained from the PCR fragments : 
From fragment BRj4-4: 
BR34-4-20 (SEQ ID NO 1), BR34-4-19 (SEQ ID NO 3) 

From fragment BR36-23: 
BR36-23-18 (SEQ ID NO 5), BR36-23-20 (SEQ ID NO 7) 
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From fragment BR53-2: 
BR33-2-17 (SEQ ID NO 9), BR33-2-21 (SEQ ID NO II) 

An alignment of sequences with SEQ ID NO I. 5 and 9 with known secuences is given 
in Figure I. An alignment of the deduced an:ino acid sequences is shown m Figure 2'The 
3 isolates are very closely related to each other (mumal homologies of about 95 %) and to the 
published sequences of type 3a (Men et al.. 1992). but are only distantly related to type I 
and type 2 sequences (Table 5). Therefore, it is clearly demonstrated that NS5 sequences 
from LiPA-selected type 3 sera are indeed derived from a type 3 genome. Moreover bv 
analyzing the NS5 region of serum BR34, for which no S'UR sequences were determined as 
described m Stuyver et al. (1993). the excellent correlation between typing bymeans of the 
LiPA and genotyping as deduced from nucleotide sequencing was further proven. 

E.xample 2; The Cnrp/FT r pgjon nf Hrv t^•p ^ ^ 

After aligning the sequences of HCV-1 (Qoo et al.. 1991), HCV-J (Kato et al.. 1990) 
HC-J6 (Okamoto et al.. 1991), and HC-J8 (O.kamoto et al., 1992), PGR primers were chosen 
m those regions of lirJe sequence variation. Prmiers HCPr23(-f-v 5: 
CTCATGGGGTACA-rrCCGCT-3- (SEQ ID NO 67) and HCPr54(i 5'- 
T.^T^ACCAG^^CATC.^TCATATCCCA-3■ (SEQ ID NO 68). were synthesized on a 392 
DNA/RNA synthesizer (Applied Biosystems). Tnis set of primers was selected to amDiif/ 
the sequence from nucleotide 397 to 957 encoding amino acids 140 to 3 19 (Kato et al.. 1990)': 
52 ammo acids from die carbo.xyterminus of core and 128 amino acids of El (Kato et al.. 
1990). The amplification products BR36-9. BRR33-1. and HDlO-2 were cloned as described 
(Stuyver et al., 1993). The following clones were obtained from the PGR fragments: 
From fragment HDlO-2: 
HDlO-2-5 (SEQ ID NO 13), HDlO-2-14 (SEQ ID NO 15). HD10-2-2I (SEQ ID NO 17) 

From fragment BR36-9: 
BR36-9-13 (SEQ ID NO 19), BR36-9-20 (SEQ ID NO 21). 

From fragment BR33-1: 
BR33-1-10 (SEQ ID NO 23). BR33-1-19 (SEQ ID NO 25). BR33-1-20 (SEQ ID NO 27), 
An alignment of the type 3 El nucleotide sequences (HDIO, BR36. BR33) with SEQ ID 
NO 13, 19 and 23 with known El sequences is presented in Figm-e 4. Four variations were 
detected in the El clones from serum HDiO and BR36. while only 2 were found m BR33. 
All are sUent third letter variations, with the exception of mutations at position 40 (L to P) 
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and 125 (M to I). The homologies of the type 3 El region (without core) with type 1 and 2 
prototype sequences are depicted in Table 5. 

In total, 8 clones covering the core/El region of 3 different isolates were sequenced and 
die Ei ponion was compared with the known geri0t\^es (Table 3) as shown in Figure 5. 
After computer analysis of the deduced amino acid sequence, a signal-anchor sequence at ±e 
core carboxyterminus was detected which might, through analogy with type lb (Hijikata et 
al., 1991), promote cleavage before the LEWRN sequence (position 192, Fig. 5). Tne L-to-P 
mutation in one of the HDlO-2 clones resides in Lhis signal-anchor region and potentially 
impairs recognition by signal peptidase (computer prediction). Since no examples of such 
substimtions were found at this position in previously described sequences, this mutation 
might have resulted from reverse transcriptase or Pfu polymerase misincorporation. The 4 
amino-terminal potential N-linked glycosylation sites, which are also present in HCV types 
la and 2, remain conserved in type 3. The N-glycosylation site in t\Tie lb (aa 250, Kato et 
al., 1990) remains a unique feature of this subtype. All El cysteines, and the putative 
transmembrane region (aa 264 to 293, computer prediction) containing the aspanic acid at 
position 279, are conserved in all three HCV tvpes. The following hypervariable regions can 
be delineated: VI from aa 192 to 203 (numbering according to Kato et al., 1990), V2 (213- 
223), V3 (230-242), V4 (248-257), and V5 (294-303). Such hydrophilic regions are thought 
to be exposed to the host defense mechanisms, Tnis variability might therefore have been 
induced by the host's immune response. Additional putative N-linked glycosylation sites m 
the V4 region in all type lb isolates known today and in the V5 region of HC-J8 (type 2b) 
possibly further contribute to modulation of the immune response. Therefore, analysis of diis 
region, in the present invention, for type 3 and 4 sequences has been instmmenLal in the 
delineation of epitopes that reside in the V-regions of El, which will be critical for future 
vaccine and diagnostics development. 



Example 3: The NS3/NS4 region of HCV Tvpe 3 

For the NS3/N!B4 border region, the foIUowing sets of primers were selected in the regions 
of little sequence variability after aligning the sequences of HCV- 1 (Chooeta!., 1991), HCV- 
J (Kato et al., 199o\ HC-J6 (Okamoto et al., 1991), and HC-J8 (Okomoto et ai., 1992) 
(smaller case lettering \s used for nucleotides added for cloning purposes): 
set A: 

HCPrll6(+): 5'-tnL\AATACATCATGRC[TGYATG-3' (SEQ ID NO 69) 
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HCPr66 (-): 5--ctatraTTGTATCCCRCTGATGAARTTCCACAT-3- (SEQ ID NO 70) 
set B: 

HCPrll6(+): S'-ttrtAAATACATCATGRCITGYATG-S- (SEQ ID NO 69) 
HCPrlI8(-):5--actagtcgactaYTGlATICCRCTlATRWARTTCCACAT-3-(SEQ[DN071) 



set C: 



HCPrI17(-): 5'-ttttAAATACATCGCIRCITGCATGCA-3' (SEQ ID NO 72) 

HCPr66 (-): S'-ctattaTTGTATCCCRCTGATGAARTrCCACATo- (SEQ ID NO 70) 
set D: 



HCPrlI7(4-): 5'-tttLAAATACATCGCIRCITGCATGCA-3' (SEQ ID NO 72) 

HCPrl 18(-): 5'-actagtcgactaYTGlATICCRCTlATRWARrrCCACAT-3- (SEQ ID N07I) 
set E: 



HCPrl 16(+): 5'-ttuAAATACATCATGRCITGYATG-3' (SEQ ID NO 69) 
HCPrl 19(-): actagtcgactaRTriGClATlAGCCG-TRTTCATCCAYTGo' (SEQ ID NO 73) 



sec F: 



HCPrl 17(-): 5'-ttttA.AATACATCGCIRCITGCATGCA-3' (SEQ ID NO 72) 

HCPrl 19(-): actagtcgactaRTriGCL\TIAGCCGn-RrrCATCCA^TG-3' (SEQ ID NO 73) 
set G: 



HCPrl31(+):5'-ggaaitctagaCCITCITGGGAYGAR.\YITGGA.ARTG-3'(SEQIDN074) 
HCPr66 (-): 5--ctatta1TGTATCCCRCTGATGAARTrCCACAT-3' (SEQ ID NO 70) 



set H: 



HCPrl30(+): 5'-ggaattctagACIGCITAYCARGClACIGTITGYGC-3' (SEQ ID NO 75) 
HCPr66 (-): 5'-ctatta-rrGTATCCCRCTGATGA.ARTTCCACAT-3- (SEQ ID NO 70) 
set I: 

HCPrI34( + ): 5'-CATATAGATGCCCAC1TCCTATC-3- (SEQ ID NO 76) 

HCPr66 (-): 5--ctattaTTGTATCCCRCTGATGAARTrCCACAT-3' (SEQ ID NO 70) 
set J: 

HCPrl31(+): 5'-ggaattctagaCCITCITGGGAYGAR.AYITGGAARTG-3' (SEQ ID NO 74) 

HCPrI18(-): S'-actagtcgactaYTGlATICCRCTIATRWARTTCCACAT-S' (SEQ ID NO 
71) 

set K: 

HCPrl30( + ): S'-ggaattctagACIGCITAYCARGCIACIGTITGYGCO' (SEQ ID NO 75) 
HCPrII8(-): S'-actagtcgacta^TGIATICCRCTIATRWARTTCCACAT-S' (SEQ ID NO 

n 1 \ 
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sec L: 

HCPrl34(^): S'-CATATAGATGCCCACTTCCTATCO* (SEQ ID NO 76) 

HCPrl 18(-): 5'-aciagtcgactaYTGIATICCRCTIATRWARTrCCACAT-3' (SEQ ID NO 71) 
se: M: 

HCPr3(-r): 5'-GTGTGCCAGGACCATC-3' (SEQ ID NO 77) and 
HCPr^(-): 5'-GACATGCATGTCATGATGTA.3 (SEQ ID NO 78) 
set N: 

HCPr3(-f ): S'-GTGTGCCAGGACCATCo' (SEQ ID NO 77) and 
HCPr 11 8(-) : 5 ^ -actagtcgactaYTGI ATICCRCTIATRWARTTCC AC AT-3 ^ (S EQ ID NO 7 1) 
set 0: 

HCPr3(+): 5'-GTGTGCCAGGACCATC-3^ (SEQ ID NO 77) and 
HCPr66 (-): 5^-ctartaTTGTATCCCRCTGATGAARTTCCACAT-3' (SEQ ID NO 70) 
No PGR products could be obtained with the sets of primers A, B, C, D, E, F, G, H, I, 
J, K, M, and N, on random-pruned cDNA obtained rrom type 3 sera. With ±e primer se: 
0, no fragment could be amplified from type 3 sera. However, a smear containing a few 
weakly stainable bands was obtained from serum BR36. After sequence analysis of several 
DNA fragments, purified and cloned from the area around 300 bp on the agarose gel, only 
one clone, HCC153 (SEQ ID NO 29), was shown to contain HCV information. Tnis 
sequence was used to design primer HCPrl52. 

A new primer set P was subsequently tested on several sera, 
set P: 

HCPrl52(+): 5'-TACGCCTCTTCTATATCGGTrGGGGCCTG-3' (SEQ ID NO 79) and 
HCPr66(-): 5'-CTATTATTGTATCCCRCTGATGAARTrCCACAT-3' (SEQ ID NO 70) 
The 464-bp HCPr 152/66 fragment was obtained from serum BR36 (BR36-20) and serum 
HDIO (HDIO-1). The following clones were obtained from these PGR products: 

From fragment HDlO-1: 
HDlO-1-25 (SEQ ID NO 31), HDIO-IO (SEQ ID NO 33), 

From fragment BR36-20: 
BR36-20-164 (SEQ ID NO 35), BR36.20.165 (SEQ ID NO 37), BR36-20-I66 (SEQ ID 
NO 39), 

The nucleotide sequences obtained from clones with SEQ ID NO 29, 31, 33, 35, 37 or 
39 are shown aligned with the sequences of prototype isolates of other types of HCV m 
Figure 6. In addition to one silent 3rd letter variation, one 2nd leaer mutation resulted in an 
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E to G substitution at position 175 of ±e deduced amino acid sequence of BR36 (Fig. 7). 
Serum HDIO clones were comple-.-ly identical. The two type 3 isolates were nearly 9455 
homologous in this NS4 region. Tne homologies with other types are presented in Table 5. 

Example 4; Analysis of the .inti-NS4 r^nn.. to tvn«>-<p po?fi. p.p.^^^ 

As the NS4 sequence contains the information for an important epitope cluster, and since 
antibodies towards this region seem co exhibit little cross-reactivity (Chan et al.. 1991). it was 
worthwhile to investigate ±e type-specific antibody response to this region. For each of ±e 
3 genotypes, HCV-1 (Choo et al.. 1991). HC-J6 (Okamoto et al.. 1991) and BR36 (present 
invention), three 20-mer peptides were synthesized covering the epitope region berween ammo 
acids 1688 and 1743 (as depicted m tabie 6). The synthetic peptides were applied as parallel 
lines onto membrane strips. Detection of anti-.NS4 antibodies and color development was 
performed according to the procedure described for the INNO-LIA HCV Ab II kit 
(Innogenetics, Antwerp). Peptide synthesis was carried out on a 9050 PepSyathesize: 
(Millipore). After incubation with 15 LiPA-selected type 3 sera, 9 samples showed reactivity 
towards NS4 peptides of at least 2 different types, but a clearly positive reaction was 
observed for 3 sera (serum BR33. HD30 and DKH) on the type 3 peptides, whUe negative 
(serum BR33 and HD30) or indeterminate (serum DKH) on the type 1 and typs 2 NS4 
peptides; 3 sera tested negative for anti-NS4 antibodies (Figure 8). Using the same membrane 
strips coated with the 9 peptides as indicated above and as shown in Figure 8. 38 t^-pe 1 sera 
(10 type la and 28 type lb), 1 1 type 2 sera (10 type 2a and 1 type 2b), 12 type 3a sera and 
2 type 4 sera (as determined by the LiPA procedure) were also tested. As shown in Table 8, 
the sera reacted in a genotvTje-specific manner with the NS4 epitopes. These results 
demonstrate that type-specific anti-NS4 antibodies can be detected in the sera of some 
patients. Such genocype-specinc synthetic peptides might be employed to develop serotyping 
assays, for e.xample a mixture of the nine peptides as indicated above, or combmed with the 
NS4 peptides from the HCV type 4 or 6 genotype or from new genotypes corresponding to 
the region between amino acids 1688 and 1743, or synthetic peptides of the NS4 region 
between amino acids 1688 and 1743 of at least one of the 6 genotypes, combined with the El 
protein or deletion mutants thereof, or synthetic El peptides of at least one of the genotypes. 
Such compositions could be further extended with type-specific peptides or proteins, including 
for example the region between amino acids 68 and 91 of the core protein, or more 
preferably the region between amino acids 68 and 78. Furthermore, such tvpe-specific 
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antigens may be advantageously used to improve current diagnostic screening and 
confirmation assavs and/or HCV vaccines. 



Example 5 The Core and El regions of HCV n-pp 5 

Sample BE95 was seiecad from a group of sera that reacted positive in a prototype Line 
Probe Assay as described earlier (Stuyver et ai.. 1993), because a high-titer of HCV RNA 
could be detected, enabling cloning of fragments by a single round of PGR. As ao sequences 
from any coding region of type 5 has been disclosed yet, synthetic oligonucleotides for PCR 
amplification were chosen in the regions of little sequence variation after aligning the 
sequences of HCV-1 (Choo et al., 1991), HCV-J (Kato et al., 1990), HC-J6 (Okamoto et 
al.. 1991), HC-J8 (Okamoto et al., 1992), and the new type 3 sequences of the present 
invention HDIO, BR33, and BR36 (see Figure 5, Example 2). The following sets of primers 
were synthesized on a 392 DNA/RNA synthesizer (Applied Biosystems): 
Set I: 

HCPr52(-^): 5"-atgTTGGGT.\AGGTCATCGATACCCT-3' (SEQ ID NO 80) and 
HCPr54(-): 5'-ctattaCCAGTTCATCATCATATCCCA-3" (SEQ ID NO 78) 
Set 2: 

HCPr41(+): 5'-CCCGGGAGGTCTCGTAGACCGTGCA-3' (SEQ ID NO 81) and 
HCPr40(-): 5'-ctatt2AAGATAGAGAAAGAGCAACCGGG-3'(SEQ ID NO 82) 
Set 3: 

HCPr41( + ): 5'-CCCGGGAGGTCTCGTAGACCGTGCA-3' (SEQ ID NO 81) and 
HCPr54(-): 5'-ccanaCCAGTTCATCATCATATCCCA-3' (SEQ ID NO 78) 
The three sets of primers were employed to amplify the regions of die type 5 isolate PC 
as described (Stuyver et al., 1993). Set 1 was used to amplify the El region and yielded 
fragment PC-4, set 2 was designed to yield the Core region and yielded fragment PC-2. Set 
3 was used to amplify the Core and El region and yielded fragment PC-3. These fragments 
were cloned as described (Scuyver et al., 1993). The following clones were obtained from the 
PCR fragments: 

From fragment PC-2: 
PC-2-1 (SEQ ID NO 41), PC-2-6 (SEQ ID NO 43), 

From fragment PC-4: 
PC-4- 1 (SEQ ID NO 45). PC-4-6 (SEQ ID NO 47), 
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From fragment PC-3: 
PC-3-4 (SEQ ID NO 49). PC-3-8 (SEQ ID NO 51) 

An alignment of sequences with SEQ ID NO 41. 43, 45. 47. 49 and 51. is given in Figure 
9. A consensus amino acid sequence (PC C/El: SEQ ID NO 54) can be deduced from each 
of the 2 clones cloned from each of die three PCR fragments as depicted in Figure 5. which 
overlaps the region between nucleotides 1 and 957 (Kato et ai.. 1990). Tne 6 clones are very 
closely related to each other (mutual homologies of about 99.7%). 

An alignment of nucleotide sequence with SEQ ID NO 53 or 151 (PC C/EI from isolate 
BE95) with known nucleotide sequences from the Core/El region is given in Figure 3. The 
clone is only distantly related to type 1, type 2, type 3 and type 4 sequences a^le 5). 

Example 6: NS3/NS4 reyiop of HCV fvp p > 

Attempts were undertaken to clone the NS3/NS4 region of the isolate BE95. described in 
example 5. The following sets of primers were selected in the regions of little sequence 
variabUity after aligning ±e sequences of HCV-1 (Choo et al.. 1991). HCV-J (Kato et al.. 
1991). HC-J6 (Okamoto et al., 1991). and HC-J8 (Okamoto et al.. 1992) and of the 
sequences obtained from type 3 sera of the present invention (SEQ ID NO 31. 33. 35, 37 and 
39): smaller case lettering is used for nucleotides added for cloning purposes: 
set A: 

HCPrl 16(+): 5--tttLAAATACATCATGRClTGYATG-3' (SEQ ID NO 66) 

HCPr66 (-): 5'-ctattaTTGTATCCCRCTGATGAARTTCCACAT-3' (SEQ ID NO 70) 
set B: 

HCPrl 16(+): 5'-nttAAATACATCATGRCrrGYATG-3' (SEQ ID NO 69) 

HCPrl 18(-): 5'-actagtcgactaYTGIATICCRCTIATRWARTrCCACAT-3' (SEQ ID NO 71) 
set C: 

HCPrl 17(+): 5--ttttAAATACATCGCIRCITGCATGCA-3' (SEQ ID NO 72) 

HCPr66 (-): 5'-ctattaTTGTATCCCRCTGATGAARTTCCACAT-3' (SEQ ID NO 70) 
set D: 

HCPrl 17(+): 5--ttttAA.\TACATCGCIRCITGCATGC.V3' (SEQ ID NO 72) 

HCPrl 18(-): 5'-actagtcgactaYTGlATICCRCnATRWARTTCCACAT-3' (SEQ ID NO 71) 
set E: 

HCPrII6( + ): 5'-ttttAAATACATCATGRCITGYATG-3' (SEQ ID NO 69) 

HCPrl 19(-): actagtcgactaRmGCIATIAGCCG/TRTTCATCCAYTG-3' (SEQ ID NO 73) 
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set F: 

HCPrll7(+): 5'-tttiAAATACATCGCIRCITGCATGCA-3' (SEQ ID NO 72) 
HCPrll9(-): actagtcgactaRTTIGClATIAGCCGyTRTTCATCCAYTG-3' (SEQ ID NO 73) 
set G: 

HCPrI31(+): 5--ggaa£tctagaCCITCITGGGAYGARAYITGGAARTG-3' (SEQ ID NO 74) 
HCPr66 (-): 5'-ctanarrGTATCCCRCTGATGAARTTCCACAT-3' (SEQ ID NO 70) 
set H: 

HCPrl30(-r): 5'-ggaattctagACIGCrTAYCARGCIACrGTITGYGC-3' (SEQ ID NO 75) 
HCPr66 (-): 5'-ctattaTTGTATCCCRCTGATGAARTTCCACAT-3' (SEQ ID NO 70) 
set I: 

HCPrI34(-r): 5'-CATATAGATGCCCACTTCCTATC-3' (SEQ ID NO 76) 
HCPr66 (-): 5'-ctattaTTGTATCCCRCTGATGAARTTCCACAT-3' (SEQ ID NO 70) 
set J: 

HCPrl31(-i-): 5'-ggaatictagaCCITCITGGGAYGARAYITGGAARTG-3' (SEQ ID 74) 
HCPrI 18(-): 5'-actag:cgac-^YTGlATICCRCTIATRWARTTCCACAT-3' (SEQ ID NO 71) 
set K: 

HCPrl30(-r): 5'-ggaattctagACIGCITAYCARGCIACIGTITGYGC-3* (SEQ ID NO 75) 
HCPr 1 1 8(-) : 5 ' -actagtcgactaYTGlATICCRCTL\TRWARTTCC ACAT-3 ' (SEQ ID NO 7 1 ) 
set L: 

HCPrI34(-r): 5'-CATATAGATGCCCACTTCCTATC-3' (SEQ ID NO 76) 
HCPrlI8(-): 5'-actagtcgactaYTGlATICCRCTIATRWARTTCCACAT-3' (SEQ ID N07I) 
set M: 

HCPr3(+): 5'-GTGTGCCAGGACCATC-3' (SEQ ID NO 77) and 
HCPr4(-): 5'-GACATGCATGTCATGATGTA-3' (SEQ ID NO 78) 
set N; 

HCPr3(+): 5'-GTGTGCCAGGACCATC-3' (SEQ ID NO 77) and 

HCPrlI8(-): 5'-actagtcgactaYTGIATICCRCTIATRWARTTCCACAT-3' (SEQ ID NO 
71) 

set O: 

HCPr3(+): 5'-GTGTGCCAGGACCATC-3' (SEQ ID NO 77) and 

HCPr66 (-): 5'-ctattaTTGTATCCCRCTGATGAARTTCCACAT-3' (SEQ ID NO 70) 

No PGR products could be obtained with the sets of primers A. B, C, D, E, F, G, 
H, I, J, K, L, M, and N, on random-primed cDNA obtained from type 3 sera. However, 
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set 0 yielded what appeared to be a PCR anifact fragment estimated about 1450 base 
pairs, instead of the expected 628 base pairs. Although it is not expected that PCR artifact 
fragments contain information of the gene or genome that was targetted in the experiment, 
effort were put in cloning of this anifact fragment, which was designated fragment PC-1. 
The following clones, were obtained from fragment PC-1: 

PC-1-37 (SEQ ID NO 59 and SEQ ID NO 55). PC-1-48 (SEQ ID NO 61 and SEQ ID MO 

57) 

The sequences obtamed from the 5' and 3" ends of the clones axe given in SEQ ID NOS 
53, 57, 59, and 61, and the complete sequences with SEQ ID NO 197 and 199 are shown 
aligned with the sequences of prototype isolates of odier types of HCV in Figure 10 and the 
alignment of the deduced amino acid sequences is shown in Figure 1 1 and 7. Surprisingly, 
the PCR artifact clone contained HCV information. The positions of the sequences withm the 
HCV genome are compatible with a contiguous HCV sequence of 1437 nucleotides, which 
was the estimated size of the cloned PCR artifact fragment. Primer HCPr66 pruned correctly 
at die expected position in the HCV genome. Therefore, prmier HCPr3 must have 
incidentally misprmied at a position 809 nucleotides upstream of its legitimate position m die 
HCV genome. This could not be expected since no sequence information was available from 
a coding region of type 5. 

Example 7 ; The E2 region of HCV fvpp ^ 

Serum BE95 was chosen for experiments aimed at amplifying a part of the E2 region of HCV 
type 5. 

After aligning the sequences of HCV-1 (2), HCV-J(l). HC-J6 (3), and HC-J8 (4), PCR 
primers were chosen in those regions of little sequence variation. 

Primers HCPrl09( + ); 5--TGGGATATGATGATGAACTGGTC-3' (SEQ ID NO 141) and 
HCPrl4(-): 5'-CCAGGTACAACCGAACCAATTGCC-3- (SEQ ID NO 142) were combined 
to amplify the aminoterminal region of the E2/NS1 region, and were synthesized on a 392 
DNA/RNA synthesizer (Applied Biosystems). With primers HCPrl09 and HCPrl4. a PCR 
fragment of 661 bp was generated, containing 169 nucleodtides corresponding to the El 
carboxyterminus and 492 bases from the region encoding the E2 aminoterminus. 

An alignmem of the type 5 E1/E2 sequences with seq ID NO. 158 with known sequences is 
presented in Figure 10. The deduced protein sequence was compared widi the different 
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genotypes (Fig. 12, amino acids 328-546). In the El region, there were no extra structural 
important motifs found. The aminoterminal pan of E2 was hypervariable when compared 
with the odier genotypes. All 6 N-glycosylation sites and all 7 cysteine residue's were 
conserved in this E2 region. To preserve alignment, it was necessary to introduce a gap 
between aa 474 and 475 as for type 3a. but not between aa 480 and 481. as for type 2. 

Example 8 : The N'S5b region of HCV tvpp 4 

Type 4 sera GB48. GB116, GB2I5, and GB358, selected by means of the line probe assay 
(LiPA, Stuyver et al.. 1993). as well as sera GB549 and GB809 that could not be typed by 
means of this LiP.A (only hybridization was observed with the universal probes), were 
selected from Gabonese patients. All these sera were positive after the first round of PGR 
reactions for the 5' untranslated region (Stuyver et al.. 1993) and were retained for further 
study. 

RNA was isolated from the sera and cDNA synthesized as described in example 1. 
Universal primers in the NS5 region were selected after alignment of the published sequences 
as follows: 

HCPr206(+): 5'-TGGGGATCCCGTATGATACCCGCTGCTTTGA-3' 
(SEQ ID NO. 124) and 

HCPr207(-): 5--GGCGGAATTCCTGGTCATAGCCTCCGTGAA-3' 
(SEQ ID NO. 125); 

and were synthesized on a 392 DNA/RNA synthesizer (Applied Biosystems). Using the Line 
Probe Assay (LiPA), four high-titer type 4 sera and 2 sera that could not be classified were 
selected and subsequendy analyzed with the primer set HCPr206/207. NS5 PGR fragments 
obtamed using these primers from serum GB48 (GB48-3). serum GB116 (GB116-3). serum 
GB215 (GB215-3), serum GB358 (GB358-3), serum GB549 (GB549-3), and serum GB809 
(GB809-3). were selected for cloning. The following sequences were obtained from the PGR 
fragments: 

From fragment GB48-3 : GB48-3-I0 (SEQ ID NO. 106) 
From fragment GB116-3: GBl 16-3-5 (SEQ ID NO. 108) 
From fragment GB215-3: GB215-3-8 (SEQ ID NO. 110) 
From fragment GB358-3: GB358-3-3 (SEQ ID NO. 112) 

From fragment GB549-3: GB549-3-6 (SEQ ID NO. 114) 
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From fragment GB8G9-3: GB8G9-3-1 (SEQ ID NO. 116) 

An alignment of nucleotide sequences with SEQ ID NO. 106. 108. 1 10. 1 12, 114, and 1 16 
with known sequences is given in Figure 1. An alignment of deduced amino acid sequences 
with SEQ ID NO. 107. 109. 111. 113. 115. and 117 with known sequences is given in Figure 
2. The 4 isolates that had been t>ped as type 4 by means of LiPA are very closely related to 
each other (mutual homologies of about 95%). but are only distantly related to type 1, type 
2, and type 3 sequences (e.g. GB358 shows homologies of 65.6 to 67.7% with other 
genotypes, Table 4). The sequence obuined from sera GB549 and GB809 also show similar 
homologies with genotypes 1, 2. and 3 (65.9 to 68.8% for GB549 and 65.0 to 68.5% for 
GB809, Table 4), but ac intermediate homology of 79.7 to 86.8% (often observed between 
subtypes of the same type) exists between GB549 or GB809 with ±e group of isolates 
consisting of GB48, GB 1 16. GBZb*. and GB358, or between GB549 and GB809. These data 
indicate the discover}' of 3 new subt>pes within the HCV genotype 4: in the present 
mvention, these 3 subtypes are designated subtype 4c, represented by isolates GB48, GB116, 
GB215, and GB358. subtype 4g, represented by isolate GB549, and subtype 4e, represented 
by isolate GB809. Although the homologies obser-'ed between subtypes in the NS5 resion 
seem to indicate a closer relationship between subtypes 4c and 4e, the homologies observed 
in the El region indicate that subtypes 4g and 4e show the closest relation (see example 8). 

E.xamole 9 : The Cnre;Tl region nf HCV fvpe 4 

From each of the 3 new type 4 subtypes, one representative serum was selected for cloning 
experiments in the Core/El region. GB549 (subtype 4g) and GB809 (subtype 4e) were 
analyzed together with isolate GB358 that was chosen from the subtype 4c group. 
Synthetic oligonucleotides: 

After aligning the sequences of HCV-1 (2), HCV-J(l), HC-J6 (3), and HC-J8 (4). PGR 
primers were chosen in those regions of little sequence variation. 

Primers HCPr52(+): 5 -atgTTGGGTAAGGTC.\TCGATACCCT-3-; HCPr23(+): 5 -. 
CTC ATGGGGTACATTCCGCT-3 • , and HCPr54(-): 5'- 
CTATTACCAGTTCATCATCAT.A.TCCCA-3', were synthesized on a 392 DNA/RNA 
synthesizer (Applied Biosystems). The sets of primers HCPr23/54 and HCPr52/54 were used, 
but only with the primer set HCPr52/54. PGR fragments could be obtained. This set of 
primers amplified the sequence from nucleotide 379 to 957 encodmg amino acids 127 to 319: 
65 amino acids from the carbo.xy terminus of core and 128 amino acids of El. The 
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amplification products GB358-4, GB549-4, and GB809-4 were cloned as described in example 
I. The following clones were obtained from the PGR fragments: 
From fragment GB358-4: GB358-4-1 (SEQ ID NO 118) 
From fragment GB549-4: GE549-4-3 (SEQ ID NO 120) 
From fragment GB809-4: 03809-4-3 (SEQ ID NO 122) 

An alignment of the t^^TJe 4 Core/El nucleotide sequences wi± seq ID NO. 1 18, 120, and 122 
with known sequences is presented in Figure 4. The homologies of the type 4 El region 
(without core) with rypt 1, type 2, type 3, and type 5 prototype sequences are depicted in 
Table 4. Homologies of 53 to 66% are observed with representative isolates of non-type 4 
genotypes. Observed homoiogies in the El region within type 4, between the different 
subtypes, ranges from 75.2 to 78.4%. The recently disclosed sequences of the core region 
of Egyptian type 4 isolates (for example EG-29 in Figure 3) described by Sinamonds et al. 
(1993) do not allow alignment with the Gabonese sequences (as described in the present 
invention) in the NSB region and may belong to different type 4 subtypes(s) as can be 
deduced from the core sequences. Tne deduced amino acid sequences with SEQ ID NO 1 19, 
121, and 123 are aligned with other prototype sequences in Figure 5. Again, t^i^pe-specific 
variation mainly resides in the variable V regions, designated in the present invention, and 
therefore, type-4-specific amino acids or V regions will be instrumental in diagnosis and 
therapeutics for HCV type 4. 

Example 10 : The Core/El and NS5b regions of new HCV tvpe 2. 3 and 4 subtypes 

Samples NE92 (subtype 2d), BE98 (subtype 3c), C.\M600 and GB809 (subtype 4e), 
CAMG22 and CAMG27 (subtype 4f), GB438 (subtype 4h), CAR4/1205 subtype (4i), 
CAR 1/501 (subtype 4j), CAR 1/901 (subtype 4?), and GB724 (subtype 4?) were selected from 
a group of sera that reacted positive but aberrantly in a prototype Line Probe Assay as 
described earlier (Stuyver et al., 1993). Another type 5a isolate BEIOO was also analyzed in 
the C/El region, and yet another type 5a isolate BE96 in the NS5b region. A high-titer of 
HCV RNA could be detected, enabling cloning of fragments by a single round of PGR. As 
no sequences from any coding region of these subtypes had been disclosed yet, synthetic 
oligonucleotides for PGR amplirlcation were chosen in the regions of little sequence variation 
after aligning the sequences of HCV-1 (Choo etal., 1991), HCV-J(Kato etal., 1990), HC-J6 
(Okamoto et al., 1991), HC-J8 (Okamoto et al., 1992), and the other new sequences of the 
present invention. 
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The above mentioned sets 1 "7 and ^ r.;*- (.■ra,T,„io c\ c ■ 

1 Prp ^ -3 1. - and 3 e^plc 3) of primers were >^ed. bu. only wich 

1. PCR ^agmems could be obuined from all isolates (excep, for BE98. GB7^4 and 
CARl/501). This se, of primers amplified d,e sequence from nucleodde 379 » 957 encoding 
amta acids 127 „ 3!9: 65 amino acids from the ca.-boxy,ermmus of core and 128 amino 
acds of El. Wid, se, 3, the core-l region from isolate NE92 and BE98 could be amplified 
and w,dh set 2, the core region of GB358. GB724, GB809, and C.^600 could be amplified' 
The ampLficatioo products were closed as described m example 1 . The following cloQes were 
obtamed from the PCR fragments: 

From isolate GB724. the clone with SEQ ID NO 193 from die core region. 
From isolate NE92, the cloiie with SEQ ID NO 143 

From isolate BE98, the clone from the core/E! region of which pan of the sequence has been 
anaiyzed and is given in SEQ ID NO 147, 

From isolate C.^M600. the clone with SEQ ID NO 167 from the El ree.oa, or SEQ ID NO 
I6o from the Core/El region as shown in Figure 3. 

From isolate C.\MG22, the clone with SEQ ID NO 171 from rh. pi 

^ fTom the El region as shown in 

Figure 4, 

from isolate GB358. the clone with SEQ ID NO 191 in the core region 
from isolate CAMG27. the clone with SEQ ID NO 173 from the core/El reeion 
from isolate GB438. the clone with SEQ ID NO 177 from the core/ El region 
from isolate CAR4/I205. the clone with SEQ ID NO 179 from the core/El region 
from isolate CARl/901. the clone with SEQ ID NO 181 from the core/ El re.^ion' 
from isolate GB809. the clone GB809-4 with SEQ ID NO 189 from the core/El region 
clone GB809-2 with SEQ ID NO 169 from the core/El region and the clone with SEQ ID 
NO 16 J from the core region, 

and frotn isolate BEIOO, the clone wid, SEQ ID NO 155 from the Core/El region as shown 

in Figure 4. 

A. alignment of these Core/El sequences with known Core/El sequences is presented in- 
Figure 4. The deduced amino acid sequences with SEQ ID NO 144. 148 164 168 170 H"' 
174. 178. 180. 182. 190. 192. 194. 156. 166 are aligned with other prototype sequences in 
Figure Again, type-specific variation mainly resides in the variable V regions designated 
m the present invention, and therefore. t>pe 2d. 3c and t>pe 4-specific amino acids or V 
regions will be instrumental in diagnosis and therapeutics for HCV type (subtype) 2d. 3c or 
the different type 4 subtypes. 
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The NS5b region of isolates NE92, BE98, CAM600, CAMG22, GB438, CAR4/1205, 
CARl/501. and BE96 was amplified widi primers HCPr206 and HCPr207 (Table 7). The 
corresponding clones were cloned and sequenced as In example 1 and the corresponding 
sequences (of which BE98 was pardy sequenced) received the following ideniirlcation 
numbers: 

NE92: SEQ ID NO 145 
BE98: SEQ ID NO 149 
CAiM6(X): SEQ ID NO 201 
CAMG22: SEQ ID NO 203 
GB438: SEQ ID NO 207 
CAR4/I205: SEQ ID NO 209 
CARl/501: SEQ ID NO 211 
BE95: SEQ ID NO 159 
BE96: SEQ ID NO 161 

An alignment of these NS5b sequences with known NS5b sequences is presented in Figure 
1. The deduced amino acid sequences with SEQ ID NO 146, 150, 202, 204. 206, 208, 210, 
212, 160, 162 are aligned with odier prototype sequences in Figure 2. Again, subtype-specific 
variations can be observed, and therefore, type 2d, 3c and type 4-specific amino acids or V 
regions will be instrumental in diagnosis and therapeutics for HCV type (subtype) 2d, 3c or 
the different type 4 subtypes. 

Example 11 : Geno tvpe-specific reactivity of anti-El antibodies (Serotvpiiig) 

El proteins were expressed from vaccinia virus constructs containing a core/El region 
extending from nucleotide positions 355 to 978 (Core/El clones described in previous 
examples including the primers HCPr52 and HCPr54), and expressed proteins from LI 19 
(after the initiator methionine) to W326 of die HCV polyprotein. The expressed protein was 
modified upon expression in the appropriate host cells (e.g. HeLa, RK13, HuTK-, HepG2) 
by cleavage between amino acids 191 and 192 of the HCV polyprotein and by the addition 
of high-mannose type carbohydrate modfs. Therefore, a 30 to 32 kDa glycoprotein could be 
observed on western blot by means of detection with serum from patients with hepatitis C. 

As a reference, a genotype lb clone obtained form die isolate HCV-B was also expressed 
in an identical way as described above, and was expressed from recombinant vaccinia virus 
wHCV-llA. 
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A panel of 104 geootyped sera was fct tested for reactivity with a cell lysate containing 
type lb protein expressed from the recombinant vaccinia virus wHCV-UA. and compared 
with cell lysate of RK13 cells mfected with a wUd type vaccinia virus ('ElAVT'). The lysates 
were coated as a 1/20 dUutioo on a normal ELISA microtiter plate (Nunc maxisorb) and left 
to react with a 1/20 dUuation of the respecJve sera. Tne panel consisted of 14 type la, 38 
type lb. 21 type 2. 21 type 3a, and 9 type 4 sera. Human antibodies were subsequentlv 
detected by a goat anti-human IgG conjugated with peroxidase and the enzyme activity wal 
detected. The optical density values of the EI and wild type lysates were divided and a factor 
2 was taken as the cut-off. The results are given in the table A. Eleven out of 14 type la sera 
(7970). 25 out of 38 type lb sera (66%). 6 out of 21 (29%), 5 out of 21 (24%): and none of 
the 9 type 4 or the type 5 serum reacted (0%). These experiments clearly show the hiah 
prevalence of anti-El antibodies reactive with the type 1 El protein in patients infected wi'th 
type I (36/52 (69fo)) (either type la or type lb), but the low prevalence or absence in non- 
type 1 sera (11/52 (21%)). 



TABLE A 



1 serum 


El/WT 


type la 




3748 


3.15 


3807 


3.51 


5282 


1.99 


9321 


3.12 


9324 


2.76 


9325 


6.12 


9326 


10.56 


9356 


1.79 


9388 


3.5 


8366 


10.72 


8380 


2.27 


10925 


4.02 


10936 


5.04 


10938 


1.36 
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type lb 




5205 


2.25 


5222 


1.33 


5246 


1.24 


5250 


13.58 


5493 


0.87 


5573 


1.75 


8243 


1.77 


8244 


2.05 


8316 


1.21 


8358 


5.04 


9337 


14.47 


9410 


5 


9413 


5.51 


10905 


1.26 


10919 


5.00 


10928 


8.72 


10929 


8.26 


10931 


2.3 


10932 


4.41 


44 


2.37 


45 


3.14 


46 


4.37 


47 


5.68 


48 


2.97 


49 


1.18 


50 


9.85 


51 


4.51 


52 


1.11 


53 


5.20 


54 


0.98 


55 


1.48 


56 


1.06 


57 


3.85 


58 


7.6 


59 


3.28 


60 


3.23 


61 


7.82 


62 


1.92 
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type 4 






0.8/ 




UAy 


GB113 


0 68 


GB116 


0.73 


GB215 


0 5"' 


G3358 


0.56 


\jDJjy 


u. / 1 


GB438 


1.08 


GB516 


1.04 


type 5 




BE95 


0.86 



Core/El clones of isolates BR36 (type 3a) and BE95 (cype 5a) were subsequently recombmed 
iato the viruses wHCV-62 and v'vHCV-63, respectively. A genocyped pane! of sera was 
subsequently tested onto cell lysates obtained from RK13 cells infected with the recombinajit 
viruses wHCV-62 and wHCV-63. Tests were carried out as described above and the results 
are given in the table given below (T.\BLE B). From these results, it can clearly be seen that, 
although some cross-reactivity occurs (especially between type 1 and 3), the obtained values 
of a given serum are usually higher on its homologous EI protein than on an EI protein of 
another genotype. For type 5 sera, none of the 5 sera were reactive on tynpe 1 or 3 Ei 
proteins, while 3 out of 5 were shown to contain anti-El antibodies when tested on their 
homologous type 5 protein. Therefore, in this simple test system, a. considerable number of 
sera can already be serotyped. Combined with the reactivity to type-specific NS4 epitopes or 
epitopes derived from other type-specific pans of the HCV polyprotein, a serotyping assay 
may be developed for discriminating the major types of HCV. To overcome the problem of 
cross-reactivity, the position of cross-reactive epitopes may be determined by someone skilled 
in the art (e.g. by means of competition of the reactivity with synthetic peptides), and die 
epitopes evoking cross-reactivity may be left out of the composition to be included in the 
serotypung assay or may be included in sample diluent to ouicompete cross-reactive 
antibodies. 
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1 , 

1 serum 


El'VWT 


El'VWT 


El-'VWT 










type lb 








8316 


0.89 


0.59 


0.30 


8358 


2.22 


2.65 


1.96 


9337 


1.59 


0.96 


0.93 


9410 


16.32 


9.60 


3.62 


9413 


9.89 


2.91 


2.85 


10905 


1.04 


0.96 


1.05 


10919 


3.17 


2.56 


2.96 


10928 


4.39 


2.28 


2.07 


10929 


2.95 


2.07 


2.08 


10931 


3.11 


1.49 


2. 1 1 


5 


0.86 


0.86 


0.96 


6 


3.48 


1.32 


1.32 


7 


6.76 


4.00 


3 .77 


8 


10.88 


3.44 


4.04 


9 


1.76 


1.88 


1.58 


10 


9.88 


7.48 


7.20 


11 


8.48 


8.99 


8.45 


12 


u. /o 


U. /2 


0.76 


13 




:).o / 


5.37 


14 




1U.D4 


1 1.22 


15 


J. 1 0 


1.62 


1.65 


type 3 








8332 


3.39 


4.22 


0.66 


10907 


3.24 


4.39 


0.96 


10908 


0.99 


0.94 


0.98 


10934 


0.86 


0.90 


0.90 


lOQ''? 


/.JO 


T "7 1 

2. / 1 


2.44 






0.80 


0.86 


O J^f-f 


1 rid 


6.66 


1.17 


O J J 1 


1 1 1 
i ./ i 


1.29 


1.22 


J\J 


u.s:) 


4.11 


0.98 




r\ or 

0.83 


2.16 


1.04 


tvnp S 










0.78 


0.95 


1.54 


BEllO 


0.79 


1.01 


4.95 


BE95 


0.47 


0.52 


0.65 


BEllI 


0.71 


0.75 


8.53 


BE112 


1.01 


1.27 


2.37 


BE113 


1. 11 


1.35 

- 


1.60 
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Table 5 . Homologies of new HCV sequences with other known HCV types 



Region 
(nucleotides) 


Isolate 


la 

HCV-I 


lb 

HCV-; 


2a 

HC-J6 


2b 
HC-J8 


3a 

Tl T 


3b 

T9 TIO 


Core no73^ 










o^.-t \oy.u) 






El (574-957) 


HDIO (3) 

BR33 (3) 
PC (5) 
GB 3 58 (4a) 
GB549 (4b) 
GB809 (4c) 


61.5 (68.0) 

60.7 (67.2) 

61.4 (64.0) 

62.5 (69.1) 
66.0 (72.2) 
63.3 (69.1) 


64.6 (68.3) 

ox. J / ..^ 

63.3 (68.0) 

62.4 (64.8) 
62.8 (65.9) 
62.8 (69.8) 

60.7 (64.3) 


57.3 (55.5) 

JQ.J \JJ.7) 

56.5 (54.7) 
54.1 (49.6) 

59.4 (54.0) 
59.1 (56.4) 
56.7 (53.2) 


56.3 (59.4) 

56.0 (58.6) 

53.3 (47.2) 

54.4 (54.0) 

56.5 (54.0) 
53.0 (51.6) 






NS3 

(3856-4209) 


PC (5) 


74.7 (89) 


76.1 (86.4) 


76.1 (89.3) 


78.0 (89.0) 






(4892-5292) 


BRj6 (3 ) 
KD 10 (3) 


67 S (IR 5"! 
69.3 (74.6) 


69.8 (75.1) 
66.6 (69.7) 


62.0 (67.5) 
57.3 (59.9) 


59.1 (59.9) 






NS4 

(4936-5292) 


PC (5) 


61.3 (62.2) 


63.0 (65.5) 


52.9 (46.2) 


54.3 (43.7) 






NS5b 

(8023-8235) 


BRJ4 (3) 
BRj6 (3) 
BR33 (3) 
GB35S (4a) 
GB549 (4b) 
GBS09 (4c) 


65.7 
64.3 
65.7 

67.7 (76.1) 

65.8 (76.1) 
68.5 (73.5) 


66.7 
67.6 
67.1 

65.6 (77.0) 
67.1 (77.0) 
65.0 (73.5) 


63,9 
64.3 
64.3 

66.5 (70.3) 
65.9 (71.7) 
67.7 (69.9) 


64.3 
66." 
64.3 

65.6 (71.7) 
65.9 (74.4) 

67.7 (73.5) 


94.3 93.9 
94.8 93.4 
94.8 93.9 


75.6 77.0 
75.1 76.5 
76.0 77.5 



Shown are the nucleotide homologies (the amino-acid homology is given bet\veen brackets) 
for the region indicated in the left column. 
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Table 6. NS4 sequences o f the difT^rPnt genotvp i.^ 



prototype 



TYPE 



SYNTHEHC PEPTIDE SS4-1 



SYNTHETIC PEPTIDE N^^^ 
(NS4b) 



SYMJh nc PEPHDE NS-i 
CNS4b) 



posiuon- > 



I I 

6 7 

9 0 

0 0 



HCV-i 



HCV.J 



HC-J6 



HC-J8 



BRj6 



PC 



la 



LSG KP.^flPDREV LYEEFDE 



lb 



LSG RPAVIPDREV LYQEFDE 



SQHLPYIEQ GNO^AEQFKS 5^ 



L\EQFK2 KALGLLQTAS RQa 



:a VNO RAVVAPDKEV LYE.-VFDE 



2b 



3a 



LND RVVV'APDKEI LYE.\FDE 



LGG KP.\IVPDKEV LYQ(? YDE 



LSG KP.AJIPDREA LYQ2 FDE 



.ASHLPYIEQ GMQU\EQFKQ ?: 



ASR.^IE£ GQRL\£ML53 K 



ASK.\JdJEE GQRM.^^MUKS.?: 



SQ^\PMEQ AQVLAHQFKT :< 



.\iSL?VMDE BAIAGQFKZK 



l-\EQFKQ KALGLLQTAT KQA 



L\EMLr4 KIQGLLQ^AS KQa 



M.^EMUCS KI2GLLQ2AT RQa 



l-kCqr^ KI'LCnSTTG QK.' 



residues conser^-ed in every genotype. Underlined amino acids are type-specific, ammo 
acids in italics are unique to type 3 and 5 sequences. 



SUBSTITUTE SHEET (RULE 25} 



wo 94/25601 ^ PCT/EP94/0D23 

71 

Table 7 



SEQ ED 
NO 


Primer NO 
(polarity) 


Seauenc* from ^* rn 


63 


HCPrl61(-) 


5 • - ACC GGAGGCC AGCAGAGTGATCTCCTCC-3 ' 


64 


HCPrl62(-) 


5'-GGGCTGCTCTATCCTCATCGACGCCATC-3' 


65 


HCPrl63(-) 


5'-GCCAGAGGCTCGGAAGGCGATCAGCGCT-3 ' 


66 


HCPrl64(-) 


5--GAGCTGCTCTGTCCTCCTCGACGCCGCA-3' 


67 


HCPr23(-) 


5'-CTCATGGGGTACATTCCGCT-3' 


68 


HCPr54{-) 


5'-CTATTACCAGrTCATCATCATATCCCA-3 - - 


69 


HCPrI16(» 


5 " -tni_\-A\T AC ATC ATGRC ITGY ATG-3 ^ 


70 


HCPr66(-) 


5'-ctanaTTGTATCCCRCTGATG.AARTTCCACAT-3' 


71 


HCPrll8(-) 


5"ac:agtcg2c:aVTGIATICCRCTIATRW.\RTTCCACAT-3' 


72 


HCPrll7(-) 


5 ■-nru\.A\TACATCGCIRCITG<:ATGCA-3 ' 


73 


HCPrI19(-) 


5'-acugTcgactaRTTIGClATIAGCCKRTTCATCCA^TG-3' 


74 


HCPrl31(-) 


5 ■-ggaanctagaCCITCITGGGAYG-AR.WlTGG.^-ARTG-j ' 


75 


HCPrl30(-^) 


5 •-ggaanc-^gACIGCITAYC^JlGCLACIGTITGYGC-3 ' 


76 


HCPrl34(-) 


5'-CATATAGATGCCCACTTCCTATC-3' 


77 


HCPr3(-) 


5"-GTGTGCCAGGACCATC-3' 


78 


HCPr4(-) 


5'-GAC ATGCATGTCATGATGTA-3 ' 


79 


HCPrI52(» 


5'-TACGCCTCTTCTATATCGGTTGGGGCCTG-3^ 


80 


HCPr52(-) 


5'-atgTTGGGT.A\G<}TCATCGATACCCT-3' 


81 


HCPr41(-) 


5'-CCCGGGAGGTCTCGTAGACCGTGCA-3 ' 


82 


HCPr40(-) 


5 '-ctana.A\GATAGAGAA\GAGC.AA.CCGGG-3 ' 


124 


HCPR206 


5 '-tg2gg2tcccgtatgaracccgctgCutga-3 ' 


125 


HCPR207 


5'-ggcgg3attcaggtcatagcctccgtgaa-3 ' 


141 


HCPR109 


5 ' -tgggatatgatgatgaacta2tc-3 ' 


142 


HCPR14 


5'-ccaggt2caaccgaacc3att2CC-3 ' 
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